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1  SUMMARY 


Prior  to  the  Proceed  program,  the  main  challenges  preventing  practical  demonstrations  and  use  of 
Fully  Homomorphic  Encryption  (FHE)  were  efficiency  and  scalability.  At  the  start  of  the 
Program,  the  state-of-the-art  FHE  implementations  were  both  inefficient  and  not  scalable.  Our 
work  in  Scalable  Implementation  of  Primitives  for  Homomorphic  EncRyption  (SIPHER)  has 
brought  FHE  into  the  realm  of  practice,  bringing  several  orders  of  magnitude  runtime 
improvement,  and  resulting  in  FHE  implementations  that  can  be  executed  on  single  and 
multicore  computers  (including  iPhones).  Furthermore,  our  implementation  of  an  FHE  hardware 
accelerator  on  a  Virtex  7  Field  Programmable  Gate  Array  (FPGA)  can  speed  up  core  FHE 
functions  by  over  three  orders  of  magnitude. 

Previous  FHE  schemes  were  inefficient  because  the  underlying  algorithms  and  their 
implementations  take  too  long  to  run  at  an  appropriate  level  of  assured  security.  Similarly,  these 
FHE  schemes  were  not  scalable  because  memory  requirements  for  encrypting  practical-length 
messages  with  a  reasonable  level  of  security  exceed  the  abilities  of  highly  parallel  computation 
devices  like  FPGAs.  These  issues  are  driven  by  several  factors: 

■  The  very  large  keys  required  for  an  assured  level  of  security  and  large  expansion  of 
unencrypted  plaintext  messages  to  encrypted  ciphertext. 

■  The  large  computation  depth  needed  for  Bootstrapping/Recryption  circuits  (an  efficiency 
bottleneck  of  FHE  schemes). 

■  The  lack  of  scalable  and  highly  optimized  implementations  of  basic  modulus  ring  operations, 
which  are  building  blocks  used  across  many  lattice  FHE  schemes. 

Our  activities  culminated  in  many  orders  of  magnitude  improvement  for  these  bottlenecks.  We 
achieved  this  revolutionary  improvement  by  significantly  advancing  the  state  of  the  art  in  a 
number  of  independent  focus  areas: 

■  Multiple  foundational  improvements  in  the  underlying  FHE  scheme  for  more  efficient  and 
scalable  implementations  of  FHE  operations.  These  improvements  include  a  new  approach  to 
FHE  Recryption,  and  the  use  of  modulus  and  ring  reduction  to  limit  ciphertext  expansion. 

■  Parallelizable,  efficient  algorithm  design  for  scalable  implementations  of  basic  computational 
primitives  at  the  core  of  lattice  FHE  schemes  improving  runtime  of  all  FHE  operations. 

■  Advanced  code  development  approach  for  efficient 
and  flexible  embedded  and  FPGA  implementations. 

Figure  1  shows  the  layered  SIPHER  approach.  We 
provide  software  interfaces  for  our  optimized  basic 
FHE  operations.  This  lets  users  construct  general 
applications  computing  on  encrypted  data.  Core 
lattice-based  primitives  form  the  heart  of  our  FHE 
implementations.  Our  modular  approach  allowed  us 
to:  1)  construct  and  experimentally  modify  multiple 
implementations  of  FHE  operations  and  2)  easily 
deploy  code  on  FPGA  hardware  to  run  the  primitives 
on  cost-effective,  massively  parallel  hardware, 
providing  3  orders  of  magnitude  improvement  in 

basic  FHE  operation  runtimes.  |  sipher  on  Hardware 


External  users  of  SIPHER  FHE 
Implementation 

:  i  i  ' 

Thin  software  layer  manages 
high-level  use  of  SIPHER 

FHE  Oper.  (Encrypt,  etc....) 

Software  layer  manages  use  of  lattice- 
based  primitives  for  FHE  operations 


Lattice-Based  Primitives 

Thin  software  layer  manages  hardware 
operations  for  lattice-based  primitives 


Figure  1:  Layered  SIPHER  approach 
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2  INTRODUCTION 

This  report  is  organized  as  follows: 

In  Section  2  we  introduce  our  work  and  describe  our  NTRU-based  FHE  cryptosystem  that 
motivates  our  design  choices  in  the  SIPHER  library.  We  introduce  our  two  demonstration 
applications:  Multiparty  Voice  over  IP  (VOIP)  Teleconferencing  and  FHE  based  Keyword 
Search  of  Encrypted  Documents.  We  finally  introduce  our  work  in  FPGA  hardware  acceleration 
of  FHE  Primitives. 

In  Section  3  we  discuss  our  Methods,  Assumptions,  and  Procedures.  Specifically  we  discuss  our 
methodology  of  a  three  pronged  research  approach:  theory,  software  design,  and  hardware 
acceleration.  We  then  detail  the  SIPHER  crypto  system  issues  and  developmental  approaches 
from  the  theoretical,  software,  and  hardware  perspectives.  We  discuss  the  issues  of  FHE 
parameter  selection  for  VOIP  and  Encrypted  Keyword  Search.  Finally,  we  discuss  approaches  to 
parallelism  for  accelerating  both  our  software  and  hardware  implementations. 

In  section  4  we  discuss  actual  implementation  details  for  the  SIPHER  from  the  point  of  view  of 
theory,  software  and  hardware.  We  then  present  key  experimental  results,  including  basic 
SIPHER  software  library  operation,  VOIP  and  Key  Word  Search  performance  and  Hardware 
Acceleration  speedup. 

Section  5  is  a  discussion  of  our  insights  and  conclusions. 

Section  6  contains  our  recommendations  for  future  work  for  furthering  the  applicability  of  our 
FHE  efforts  towards  wide  spread  practice. 

2.1  Practical  Somewhat  and  Fully  Homomorphic  Encryption  (SHE  and 
FHE) 

Recent  breakthroughs  in  Fully  Homomorphic  Encryption  (FHE)  have  shown  that  it  is 
theoretically  possible  to  securely  run  arbitrary  computations  over  encrypted  data  without 
decrypting  the  data  [1],  [2].  Known  FHE  schemes  have  several  nice  properties  which  make  them 
very  attractive  techniques  to  consider  addressing  pressing  cybersecurity  issues.  For  one,  FHE 
schemes  are  believed  to  be  very  secure.  The  security  of  FHE  schemes  are  derived  from  the 
hardness  of  mathematical  problems  which  are  currently  believed  to  be  hard  to  solve  even  with 
quantum  computing  devices  [3].  Known  FHE  schemes  are  consequently  labelled  as  post¬ 
quantum,  meaning  that  there  are  no  known  algorithms  which  are  computationally  efficient  that 
can  be  used  to  practically  break  these  schemes,  even  for  execution  on  quantum  computing 
devices.  In  addition  to  security,  a  practical  FHE  capability  would  be  game-changing  for 
cybersecurity  researchers  and  practitioners.  With  practical  FHE,  sensitive  data  could  be 
encrypted  and  placed  into  a  low  cost  cloud  computing  environment  for  processing  without 
having  to  share  decryption  keys.  This  could  greatly  reduce  the  operational  costs  of  highly 
regulated  industries  such  as  medical,  legal,  financial  and  government  industries  where  regulatory 
compliance  restricts  the  ability  to  outsource  computation  to  low  cost  cloud  computing 
environments.  Practical  FHE  could  also  greatly  reduce  the  impact  of  insider  attacks  by  greatly 
restricting  who  can  access  sensitive  data  within  an  organization,  but  still  permitting  processing  of 
this  data. 

Despite  the  attractiveness  of  known  FHE  schemes,  there  are  practical  limitations  that  have 
prevented  their  broad  practical  use.  As  indicated  in  early  implementation  research  [4],  runtime  is 
a  major  obstacle  to  be  overcome  before  homomorphic  encryption  technologies  become  practical. 
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Solutions  to  FHE  runtime  challenges  have  been  explored  through  several  means,  including  by 
improving  the  theoretical  efficiency  of  the  underlying  scheme  [5]-  [9],  and  by  developing  more 
efficient  implementations  of  these  schemes  [10]-  [16].  Despite  these  advances  in  FHE  schemes 
and  implementations,  there  have  been  little  results  on  the  application  of  these  technologies. 

Many  of  the  implementation-focused  papers  have  used  the  homomorphic  evaluation  of  the  AES 
circuits  as  benchmarks  [10],  [13],  [14].  Beyond  this  is  [17]  which  uses  the  HELib  library  [15]  to 
support  statistical  operations  such  as  linear  regression. 

2.2  The  Scalable  Implementation  of  Primitives  for  Homomorphic 
EncRyption  (SIPHER)  Library  for  SHE  and  FHE 

In  this  report  we  discuss  our  experience  designing  a  general  lattice  encryption  library  called 
SIPHER  (Scalable  Implementation  of  Primitives  for  Homomorphic  EncRyption)  to  support  both 
limited  depth  computations  for  Somewhat  Homomorphic  Encryption  (SHE)  and  full  depth 
computations  for  FHE.  We  discuss  using  this  library  to  support  an  encrypted  end-to-end  VoIP 
teleconferencing  application  on  an  embedded  processor  (iPhone).  We  also  discuss  using  the 
same  library  to  support  Encrypted  Keyword  Search  (EKS)  on  single  and  multicore  Linux 
computers.  Finally,  we  discuss  accelerating  the  single  core  EKS  with  an  attached  FPGA  based 
FHE  accelerator,  showing  that  a  single  Virtex  7  FPGA  will  execute  FHE  primitives  1600x  faster 
than  a  single  core  and  35x  faster  than  a  64  core  system. 

Unlike  prior  libraries,  SIPHER  is  intended  to  be  as  adaptable  and  extensible  as  possible,  with 
modular  software  architecture  to  support  rapid  prototyping  of  the  library,  easier  integration  of  the 
library  into  a  broader  computing  infrastructures  and  possessing  increased  parallelism.  Our 
motivation  with  the  SIPHER  library  is  that  as  prior  homomorphic  encryption  implementations 
have  reduced  absolute  runtime,  there  has  been  limited  attention  paid  to  software  engineering 
issues  that  need  to  be  addressed  for  these  libraries  to  be  flexibly  adapted  to  application  contexts. 

We  implement  in  software  specialized  lattice  primitives  such  as  Ring  Addition,  Ring 
Multiplication  and  the  Chinese  Remainder  Transfonn  (CRT). We  use  our  primitive 
implementations  to  construct  the  FHE  operations  of  Key  Generation  (KeyGen),  Encryption 
(Enc),  Decryption  (Dec),  Evaluation  Addition  (EvalAdd),  Evaluation  Multiplication  (EvalMult) 
and  Bootstrapping  (Boot).  We  use  supporting  Modulus  Reduction  (ModReduce),  Ring  Reduction 
(RingReduce)  and  Key  Switching  (KeySwitch)  operations  to  augment  the  EvalMult  operation 
and  support  larger  depth  computations  before  Bootstrapping  or  decreasing  the  security  of  our 
scheme.  Finally,  Recryption  (Bootstrapping)  is  a  function  that  will  refresh  a  ciphertext  that  has 
previously  been  operated  on,  in  order  to  enable  further  processing.  These  primitives  are  shown 
schematically  in  Figure  2. 
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Figure  2:  SIPHER  lattice  library  primitives 


Although  SIPHER  can  support  general  FHE  capabilities,  we  can  also  target  our  design  on  a  more 
focused  SHE  capability  which  supports  the  execution  of  limited-depth  programs  on  encrypted 
data.  Known  FHE  schemes  are  built  from  SHE  schemes  with  the  addition  of  Bootstrapping  to 
extend  the  depth  of  computation  supported  by  SHE.  Our  Bootstrapping  approach  is  based  on 
[18].  We  particularly  focus  our  SHE/FHE  design  and  implementation  efforts  on  a  variant  of  the 
LTV  scheme  [8]  which  is  itself  based  onNTRU  [19].  We  also  focus  on  implementations  for 
enterprise  computing  environments  with  multi-core  x86  computing  infrastructure  and  hardware 
acceleration  on  FPGAs  to  support  parallelism. 

Although  it  is  possible  to  support  parallelism  at  multiple  levels  in  our  design,  we  focus  on 
providing  parallelism  using  the  “double-CRT”  representations  of  ciphertext  [10].  Prior  SHE  and 
FHE  implementation  designs  [4],  [11],  [20],  [21],  for  the  most  part,  rely  on  single-threaded 
execution  on  commodity  CPU -type  hardware,  partially  due  to  the  difficulty  of  or  lack  of  native 
support  for  multi-threaded  execution  with  underlying  software  libraries  [22],  [23].  Although 
there  have  been  prior  implementation  efforts  that  support  double-CRT  representations  to  reduce 
runtime  by  reducing  the  bit-widths  of  ciphertexts  to  be  less  than  64  bits,  ours  is  the  first 
implementations  that  leverages  double-CRT  implementations  to  reduce  runtime  through 
parallelism. 

2.3  Application  of  SHE  to  multiparty  Voice  over  IP  Teleconferencing 

Despite  our  design  foci  on  a  general  SHE/FHE  library  for  enterprise  environments,  we  show  how 
our  library  is  adaptable  to  support  a  practical  end-to-end  encrypted  VoIP  teleconferencing 
prototype  running  on  commodity  iOS-based  iPhones.  The  basis  of  this  application  is  that  there 
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has  been  an  unmet  technological  need  to  provide  a  scalable  capability  for  multiple 
geographically  distributed  people  need  to  simultaneously  converse  as  a  group  at  the  same  time 
over  commercial  data  networks.  This  need,  until  know,  has  been  partially  served  by  either  have 
physically  secure  dedicated  point-to-point  communication  links  as  provided  by  dedicated 
circuits,  or  through  physically  unsecured  point-to-point  communication  links  which  are  made 
secure  with  point  to  point  encryption  technologies  [24],  [25].  These  prior  approaches  to  secure 
communication  are  either  not  scalable  or  have  not  adequately  addressed  several  important 
vulnerabilities  to  address  this  need.  Physically  secure  communication  links  are  not  feasible  over 
broad  geographic  areas  and  are  not  accessible  by  the  general  population.  Point-to-point 
encryption  solutions  do  no  scale  because  when  there  are  more  than  a  handful  of  participants  in  a 
teleconference  call,  a  large  number  of  point-to-point  communication  links  are  practically  difficult 
to  setup  and  maintain,  often  leading  to  latency  issues  which  would  degrade  the  quality  of  user 
experiences.  For  these  reasons,  there  is  a  need  for  such  a  technology  to  provide  scalable,  secure 
and  practical  teleconferencing  services  which  can  be  used  to  host  multi-party  negotiations, 
planning,  education,  and  information  distribution  of  a  sensitive  nature. 

VoIP  can  provide  a  fundamentally  scalable  and  practical  approach  to  teleconferencing,  especially 
with  the  advent  of  global  packet-switched  information  networks.  Unfortunately,  existing  VoIP 
teleconferencing  capabilities  such  as  GoToMeeting,  Skype  and  Mumble  among  others  have  not 
been  both  scalable  and  secure  against  data  leaking  to  adversaries  who  wish  to  snoop  on  private  or 
even  proprietary  group  communication.  For  example,  these  existing  VoIP  technologies  have 
been  vulnerable  to  man-in-the-middle  attacks  of  various  types  [26].  The  majority  of  widely  used 
existing  VoIP  teleconferencing  capabilities  require  a  central  VoIP  server  to  mix  all  of  the  VoIP 
signals  from  clients  which  are  then  sent  back  to  the  clients.  The  VoIP  mixing  operation,  which 
merges  the  VoIP  streams  from  the  clients,  has  until  now  needed  to  be  perfonned  in  the  clear,  on 
unencrypted  VoIP  data.  This  creates  a  possible  opportunity  for  adversaries  to  snoop  on  otherwise 
protected  VoIP  data  if  the  adversaries  gain  access  to  the  VoIP  server.  This  security  vulnerability 
is  a  practical  security  challenge  because  VoIP  teleconferencing  servers  are  often  hosted  in  a  semi 
secure  environment,  such  as  by  commodity  cloud  providers  such  as  Amazon  AWS  or  Microsoft 
Azure  which  might  not  be  completely  trusted.  This  induces  an  unfortunate  trade-off  of  for  this 
architecture  of  either  requiring  all  participants  to  maintain  group  conversations  in  the  clear  in 
untrusted  environments  or  paying  a  higher  cost  of  maintaining  access  to  a  trusted  VoIP  server  if 
secure  teleconferencing  is  needed.  The  limitation  of  required  server  trust  has  until  now  prevented 
the  use  of  VoIP  teleconferencing  technologies  from  being  used  in  regulated  industries  where 
privacy  is  of  an  utmost  concern. 

Taken  together,  these  technological  deficiencies  and  practical  needs  point  to  a  need  for  a  VoIP 
teleconferencing  capability  where  VoIP  data  can  never  decrypted  except  on  the  clients  which 
have  access  to  decryption  keys.  Thus,  unlike  previous  VoIP  attack  analyses  which  focus  on 
signaling  attacks  [27],  [28]  during  VoIP  call  set-up,  we  are  particularly  interested  in  protecting 
against  man-in-the-middle  attacks  to  protect  against  compromise  at  VoIP  servers. 

As  part  of  our  project  we  developed  a  secure,  scalable  and  practical  method  to  protect  against  the 
leakage  of  sensitive  VoIP  teleconferences  even  on  VoIP  teleconference  servers  that  have  been 
fully  compromised.  The  basis  of  our  approach  is  to  modify  the  SIPHER  library  to  support  an 
efficient,  additive  homomorphic  encryption.  Teleconferencing  clients  encode  their  voice  samples 
with  an  additive  encoding  scheme,  encrypt  their  encoded  voice  data  with  an  additive 
homomorphic  encryption  scheme,  send  their  encrypted  voice  samples  to  a  mixer  which  performs 
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an  encrypted  homomorphic  addition  on  the  encrypted  voice  and  sends  the  results  back  to  the 
clients.  The  clients  then  decrypt,  decode  and  play  back  the  result.  Our  scheme  relies  on  the  pre¬ 
sharing  of  a  common  private  key  for  an  additive  homomorphic  encryption  scheme,  but  it  is 
possible  in  principle  to  practically  generalize  beyond  this  pre-shared  key  design. 

We  modified  the  SIPHER  library  and  integrated  it  with  an  existing  VoIP  teleconferencing 
application  to  provide  this  capability  on  commodity  iPhone  clients  and  the  current  lowest-cost 
Amazon  EC2  server.  We  describe  the  design  tradeoffs  that  we  made  to  provide  this  capability 
and  focus  on  novel  VoIP  encoding  schemes  that  makes  the  mixing  of  encrypted  signals  possible 
at  the  VoIP  server.  We  discuss  related  engineering  tradeoffs  we  make  so  this  capability  provides 
relatively  high  sound  quality  with  full-duplex  100  kbs  data  rates.  This  initial  implementation  is 
intended  to  be  a  proof-of-concept  capability,  with  the  possibility  of  improving  upon  this 
technology  with  existing  key  management  technologies  [29]  and  session  initiation  technologies 
[30]  with  additional  engineering  investment  and  little  or  no  research  risk. 

A  preliminary  version  of  some  of  the  material  in  this  report  was  published  [31],  but  without  full 
discussion  of  the  software  engineering  and  design  tradeoffs  of  the  SIPHER  library,  and  without 
any  discussion  of  the  end-to-end  encrypted  VoIP  application. 

2.4  Application  of  FHE  to  keyword  search  of  encrypted  documents 

As  a  demonstration  of  our  FHE  capability,  we  focused  on  the  e-mail  filtering  problem.  We 
selected  a  use  model  where  users  encrypt  e-mail  messages  on  their  (trusted)  computer,  and  the 
messages  are  sent  to  an  untrusted  mail  server  for  forwarding  to  a  destination.  In  this  use  model, 
the  mail  server  also  acts  as  a  “border  guard”:  each  e-mail  message  is  checked  for  the  presence  of 
certain  strings,  and  is  only  passed  on  to  its  destination  if  those  strings  are  absent.  Because  the  e- 
mail  messages  are  private  and  for  that  reason  encrypted,  the  border  guard  must  perfonn  this 
checking  without  decrypting  the  messages.  Such  a  use  model  might  appear  in  a  corporate 
enclave  where  users  need  the  ability  to  send  encrypted  messages  out  of  the  enclave  and  over  the 
Internet,  yet  the  administrators  of  the  enclave  need  the  ability  to  ensure  that  certain  information 
is  not  allowed  beyond  the  enclave.  Interestingly  enough,  this  application  was  one  of  the  few 
applications  identified  during  the  Proceed  program  as  having  an  appropriate  security  model  for 
practical  application  for  FHE. 

2.1  Hardware  acceleration  of  FHE  primitives 

One  of  our  main  contributions  to  the  Proceed  program  has  been  the  development  of  FPGA  based 
hardware  primitives  to  accelerate  computation  on  encrypted  data.  Cipher  texts  in  our  scheme  are 
represented  as  rectangular  matrices  of  64-bit  integers.  This  bounding  of  the  operand  sizes  has 
allowed  us  to  take  advantage  of  modern  code  generation  tools  developed  by  Mathworks  to 
implement  VHDL  code  for  FPGA  circuits  directly  from  Simulink  models.  Furthennore  the 
implicit  parallelism  of  the  scheme  allows  for  large  amounts  of  pipelining  in  the  implementation 
in  order  to  achieve  efficient  throughput.  The  resulting  VHDL  is  integrated  into  an  AXI4  bus 
“Soft  System  on  Chip”  using  Xilinx  platform  studio  and  a  Microblaze  soft  core  processor 
running  on  aVirtex7  VC707  evaluation  board  for  use  as  an  attached  processor  over  Gigabit 
Ethernet.  The  resulting  system  can  also  be  hosted  directly  in  a  computer  using  the  PCIe  Express 
interface  for  direct  access  by  the  host  CPU  (eliminating  the  need  for  the  Microblaze  processor). 
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3  METHODS,  ASSUMPTIONS,  AND  PROCEDURES 

3.1  Our  methodology  (the  three  pronged  research  approach) 

Our  methodology  for  SIPHER  was  to  adopt  a  three-pronged  research  approach,  as  shown  in 
Figure  3.  Our  work  falls  into  three  specific  categories.  The  first  focus  is  on  the  theoretical  aspects 
of  FHE.  Our  FHE  expert  provides  the  latest  in  theoretical  innovations  as  applied  to  our 
constrained  design  space.  The  second  focus  is  on  implementation  of  software  designs  that  will 
support  practical  FHE  operations.  Often  algorithms  are  developed  by  the  theory  group  and 
passed  on  to  the  design  group,  iterating  back  and  forth  as  improvements  are  found.  The  third 
thrust  area  is  in  highly  parallelized  implementations,  for  example  on  multi-core  processors,  but 
also  in  the  extreme  case  with  FPGA  implementations.  We  use  a  common  development 
environment  and  code  generation  tools  to  rapidly  prototype  our  FHE  operations  in  these  highly 
parallelized  implementations.  The  constraints  imposed  by  the  implementations  are  actually  fed 
back  into  the  theory  group,  where  it  drove  the  search  for  a  more  efficient  implementation  of  the 
algorithms  from  first  principles. 


Figure  3:  The  multi-pronged  research  approach  of  SIPHER 


3.2  Procedure  of  approach 

Over  the  course  of  our  four  year  program,  developments  generally  cascaded  from  theory  to 
design  then  to  acceleration.  Each  research  thrust  was  generally  six  months  behind  the  progress  of 
the  one  before  it.  Figure  4  shows  a  timeline  of  the  various  milestones  of  the  program  with  the 
major  contribution  of  each  six  month  period.  The  initial  year  focused  on  basic  SHE  algorithms 
and  early  hardware  implementation  of  key  primitive  operations.  This  provided  a  basic 
functionality,  and  also  required  addressing  implementation  issues  such  as  efficient  modulo 
arithmetic  in  hardware,  and  ways  to  limit  the  number  of  bits  required  for  operations.  The  second 
year  focused  on  scaling  up  the  software  implementations,  and  implementing  the  first  FHE 
Processing  Unit  (FHEPU)  on  an  FPGA  (first  a  Virtex  6,  then  later  a  Virtex  7).  The  third  year 
added  the  breakthrough  theory  of  Power  of  2  bootstrapping,  and  also  focused  work  on  the  two 
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applications  mentioned  in  this  report  (secure  VOIP  teleconferencing  and  encrypted  keyword 
search).  The  final  year  focused  on  accelerating  the  KWS  with  FHE  (including  bootstrapping) 
using  FPGA  acceleration. 
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Figure  4:  A  timeline  of  SIPHER  advances  over  the  duration  of  the  program 


3.3  SIPHER  cryptosystem 
3.3.1  SIPHER  cryptosystem  design 

We  begin  by  describing  the  overall  target  of  cryptosystem  target  of  the  SIPHER  library.  We  use 
this  cryptosystem  based  on  [8]  to  motivate  design  and  engineering  tradeoffs  in  our  SIPHER 
library,  even  though  we  target  SIPHER  to  be  as  modular  and  adaptable  as  possible  to  support 
variations  of  this  scheme. 


We  begin  with  some  mathematical  preliminaries  based  on  our  focus  on  power-of-2  cyclotomics. 
For  n  a  power  of  2,  we  define  the  ring  R  —  7L\x\/ ( xn  +  1)  (i.e.,  integer  polynomials  modulo 
xn  +  1)  where,  and  for  any  positive  integer  q,  define  Rq  =  R/qR  (i.e.,  integer  polynomials 
modulo  xn  +  1,  with  mod-q  coefficients).  The  message  space  is  Rp  for  some  integer  p  >  2,  and 
most  arithmetic  operations  are  performed  modulo  some  q  »  p  that  is  relatively  prime  with  p. 
Fast  addition  and  multiplication  in  Rq  can  be  performed  by  using  the  mod-q  CRT  representation 
of  elements. 


Basic  functions 

With  these  mathematical  preliminaries,  the  mathematical  description  of  the  basic  LTV-variant 
scheme  with  “least  significant  bit”  message  encoding  is  as  follows.  (Concrete  parameters  and 
implementation  discussions  are  given  later.) 

•  KeyGen:  choose  a  “short”  f  G  R  such  that  f  =  1  mod  p  and  f  is  invertible  modulo  q,  and  a 
“short”  g  G  R.  Output  pk  =  h  =  g-fi1  mod  q  and  sk  =  f. 

Note  that  f  is  invertible  modulo  q  if  and  only  if  each  of  its  mod-q  CRT  coefficients  is 
nonzero.  The  CRT  coefficients  of  fi1  (modulo  q)  are  just  the  mod-q  inverses  of  those  of  f. 
Concretely,  the  “short”  elements  f  and  g  can  be  chosen  from  discrete  Gaussians.  E.g.,  we 
can  let  f  =  p  •  f  +  1  for  some  Gaussian-distributed  f . 

•  Enc(pk  =  h,  p  G  Rp):  choose  a  “short”  r  G  R  and  a  “short”  m  G  R  such  that  m  =  p  mod  p. 
Output  c  =  p  •  r  •  h  +  m  mod  q. 

Concretely,  m  can  be  chosen  as  m  =  p  -m'  +  p  for  a  Gaussian-distributed  m'.  In  some 
cases  it  may  be  better  to  choose  m  as  a  zero-centered  random  variable  congruent  to  p 
modulo  p. 

•  Dec(sk  =  f,  c  G  Rq):  compute  b  =  f  ■  c  mod  q,  and  lift  it  to  the  integer  polynomial  b  G  R 
with  coefficients  in  [— q/2,  q/2).  Output  p  =  b  mod  p. 
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Homomorphic  operations 

•  EvalAdd(co,  Ci):  output  c  =  Co+  canod  q. 

•  EvalMult(co,  Ci):  output  c  =  Co-  Ci  mod  q. 

With  the  use  of  EvalMult,  the  decryption  procedure  changes  slightly.  Define  the  “degree”  of 
ciphertexts  as  follows:  a  freshly  generated  ciphertext  has  degree  1,  and  the  degree  of  c  = 
EvalMult(co,  Ci)  is  the  sum  of  the  degrees  of  Co  and  ci.  Then  decryption  of  a  degree-d  ciphertext 
c  is  the  same  as  above,  except  that  we  compute  b  =  f*  •  c  mod  q. 

Key  Switching 

Key  switching  converts  a  ciphertext  of  degree  at  most  d,  encrypted  under  secret  key  fi,  into  a 
degree- 1  ciphertext  C2  encrypted  under  a  secret  key  f2  (which  may  or  may  not  be  the  same  as  fi). 

This  requires  publishing  a  “hint”  a^2  =  m  •  •  /2-1  mod  q  for  a  “short”  m  G  R  congruent  to 

1  modulo  p.  (Concretely,  we  can  choose  m  =  p  •  e  +  1  for  a  Gaussian-distributed  e.) 

•  KeySwitch(ci,  a  1^2):  output  C2  =  ai_>2‘  Ci  mod  q. 

(Note  that  ai_>2,  Ci,  c2  can  all  be  stored  and  operated  upon  in  CRT  form,  so  key  switching  is  very 
efficient- just  one  coordinate- wise  multiplication  of  the  CRT  vectors.) 

Modulus  Reduction 

Modulus  reduction  converts  a  ciphertext  from  modulus  q  to  a  smaller  modulus  (q/q'),  where  q' 
divides  q  (and  so  is  also  relative  prime  with  p),  while  also  reducing  the  underlying  noise  by  a  q' 
factor. 

The  basic  description  is  as  follows:  given  a  ciphertext  c  G  Rq,  we  add  to  it  a  small  integer 
multiple  of  p  that  is  congruent  to  -c  mod  q'.  This  ensures  that  the  underlying  noise  remains 
small,  that  the  plaintext  remains  unchanged,  and  that  the  resulting  ciphertext  is  divisible  by  q'. 
Then  we  can  divide  both  the  ciphertext  and  modulus  by  q',  which  reduces  the  underlying  noise 
term  by  a  q'  factor  as  well. 

Note  that  the  final  step  (of  dividing  by  q')  implicitly  multiplies  the  underlying  message  by  (q')  1 
mod  p.  We  can  either  keep  track  of  these  extra  factors  as  part  of  the  ciphertext  and  correct  for 
them  as  the  final  step  of  decryption,  or  we  can  just  ensure  that  q'  =  1  mod  p,  so  that  division  by  q' 
does  not  affect  the  underlying  message. 

The  following  formal  procedure  uses  the  fixed  (ciphertext-independent)  value  v  =  (q')  1  mod  p, 
which  can  be  computed  in  advance  and  stored. 

•  ModReduce(c,  q,  q'): 

1)  compute  d  =  c  mod  q'  (in  coefficient  form). 

2)  let  8  =  (vq'  -  1)  •  d  mod  (pq'),  with  all  of  S’s  entries  in  [-pq72,  pq72). 

3)  let  d'  =  c+S  mod  q.  In  coefficient  form,  all  the  entries  of  d'  should  be  divisible  by  q'. 

4)  output  (d'/q')  G  R(q  q') 

The  above  is  most  efficient  to  implement  when  q  =  qi  •  •  •  qt  is  the  product  of  several  small, 
pairwise  relatively  prime  moduli;  when  q'  is  one  of  those  moduli  (say,  q'  =  qt  without  loss  of 
generality);  and  when  c  is  represented  in  “double-CRT”  form,  i.e.,  each  of  c’s  mod-q  CRT 
coefficients  is  itself  represented  in  (integer)  CRT  form  as  a  vector  of  mod-qi  values,  one  for  each 
i.  Then  the  above  steps  can  be  performed  as  follows: 
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1)  Computing  d  =  c  mod  qt  (in  coefficient  form)  is  done  by  inverting  the  mod-  qt  CRT  on 
the  vector  of  mod-  qt  components  of  c  (leaving  the  other  mod-  q,  components  untouched). 

2)  Computing  8  is  done  by  just  multiplying  the  coefficients  of  d  by  the  fixed  scalar  (vqt  -  1) 
modulo  p  qt,  and  putting  the  results  in  the  desired  range. 

3)  Adding  8  to  c  is  done  by  computing  the  double-CRT  representation  of  8  (i.e.,  applying  each 
mod-  q,  CRT  to  8),  and  adding  it  entry  wise  to  c’s  double-CRT  representation. 

Note  that  the  mod-  qt  CRTs  of  8  and  c  are  just  the  negations  of  each  other  (by  design),  so  their 
sum  is  the  all-zeros  vector.  Therefore,  there  is  no  need  to  explicitly  compute  the  mod-  qt  CRT  of 
8  (though  it  can  be  done  as  a  sanity  check). 

4)  Computing  d V  qt  is  done  by  dropping  the  mod-  qt  components  in  the  double-CRT 
representation  of  d'  (which  are  all  zero  anyway),  and  multiplying  every  mod-qi  component  by  the 
fixed  scalar  qt  1  mod  q;.  (These  scalars  can  be  computed  in  advance  and  stored.) 

Composed  EvalMult 

We  use  the  Key  Switching,  Ring  Reduction  and  Modulus  Reduction  operations  as  supporting 
functions  with  EvalMult  to  improve  noise  management  and  enable  more  computation  between 
calls  to  the  Bootstrapping  operation.  Taken  together,  we  fonn  a  composite  operation,  which  we 
call  ComposedEvalMult  (CEM),  from  the  sequential  execution  of  an  EvalMult,  Key  Switching 
and  Modulus  Reduction  operation. 

Ring  Reduction  is  called  during  some  CEM  operations,  depending  on  the  level  of  security 
provided  by  a  ciphertext  resulting  from  the  result  of  the  Ring  Reduction  operation. 

As  Modulus  Reduction  operations  are  performed  the  security  provided  by  a  ciphertexts  increases 
(as  described  in  the  next  section).  Ring  Reduction  correspondingly  reduces  the  level  of  security 
provided  by  a  ciphertext.  We  implemented  our  FHE  library  such  that  a  minimum  level  of 
security  8'  is  provided  at  all  times,  and  this  level  of  8'  is  a  parameter  selectable  by  the  library 
user.  If  a  call  to  a  Ring  Reduction  operation  will  result  in  a  level  of  security  8  <  8',  then  the 
RingReduction  is  performed  in  the  CEM  operation. 

Our  conception  is  that  due  to  the  ModReduction  and  RingReduction  component  of 
ComposedEvalMult,  it  is  feasible  to  coordinate  the  choice  of  the  original  ciphertext  width  t  and 
the  scheduling  of  CEM  operations  so  that  the  final  ciphertext  resulting  from  secure  circuit 
evaluation  and  which  needs  to  be  decrypted  is  only  one  column  wide  with  respect  to  a  single 
modulus  qi  and  provides  a  level  of  security  at  least  as  great  as  the  original  ciphertexts  resulting 
from  the  encryption  operation.  More  explicitly,  if  we  need  to  support  a  depth  t  -  1  computation, 
the  initial  encryptions  should  only  be  t  columns  wide  to  ensure  that  the  final  ciphertext  is  1 
column  wide.  Whereas  the  runtime  of  Encryption,  EvalAdd,  CEM  depend  on  the  ring  dimension 
and  depth  of  computation  supported,  the  Decryption  operation  would  hence  depend  only  on  the 
final  ring  dimension  after  all  ring  switching  has  been  completed.  If  we  need  to  decrypt  a 
ciphertext  that  has  multiple  columns  in  our  double-CRT  representation,  we  could  perform 
multiple  ModReduction  operations  to  reduce  this  t  >  1  ciphertext  until  we  are  left  with  a  single 
mod-qi  column. 
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Bootstrapping 

The  addition  of  the  Bootstrap  operation  enables  the  noise  that  accumulates  in  our  ciphertext  to  be 
“refreshed”,  allowing  further  CEM  operations  to  be  perfonned  until  the  need  for  the  next 
Bootstrap  “refresh”.  This  can  be  repeated  a  large  number  of  times,  converting  our  SHE  scheme 
into  an  FHE  scheme. 

In  our  scheme,  a  ciphertext  is  “fresh”  if  is  encoded  with  a  large  modulus  (i.e.,  at  a  high  "tower" 
level  t  in  a  double-CRT  representation)  and  at  a  large  ring  dimension.  As  we  perform 
computation  on  the  ciphertext,  we  iteratively  perform  modulus  reduction  operations  with  every 
ComposedEvalMult  operation,  and  RingReduction  operations  scheduled  with 
ComposedEvalMult  operations  to  ensure  that  a  minimum  security  level  is  always  maintained. 

Starting  from  a  "fresh"  large  ciphertext,  with  the  iterative  application  of  ComposedEvalMult 
operations,  we  eventually  obtain  a  resulting  ciphertext  with  a  minimal  double-CRT 
representation  that  consists  of  a  single  ciphertext  vector  (that  corresponds  to  a  single-tower  level 
-  the  base  tower  level).  The  Bootstrapping  operation  refreshes  the  ciphertext  back  up  to  a  larger 
ring  dimension  and  tower  level.  Bootstrapping  does  this  by  first  switching  to  a  larger  ring,  and 
the  largest  tower  possible.  Bootstrapping  then  performs  a  homomorphic  rounding,  thus 
consuming  several  of  the  tower  levels  to  refresh  the  noise.  As  such,  the  resulting  ciphertext 
output  by  bootstrapping  is  at  a  lower  level  than  a  ciphertext  output  by  an  encryption  operation, 
but  still  supports  computation  on  itself. 

Our  design  goal  was  to  support  operations  with  ciphertext  as  large  at  t=32  and  n=16384.  In 
practice,  this  is  at  a  lower  level  of  security  than  would  be  justifiable  for  high-security 
applications,  and  a  maximum  level  of  t=16  and  n=16384  was  seen  as  a  practical  limit.  Due  to 
limitations  in  the  bit  overflows  of  our  ciphertext  encoding  both  in  64  bit  CPU  and  our  FPGA 
implementations,  we  were  practically  limited  to  ciphertext  moduli  which  were  less  than  64  bits 
and  subsequently  were  unable  to  support  bootstrapping  larger  ring  dimensions  within  the 
remaining  scope  of  the  program. 

We  found  that  the  bootstrapping  operation  consumed  anywhere  from  6  levels  of  a  ciphertext 
towers  for  n=5 12,  and  up  to  12  for  n=16384.  As  such,  if  we  support  a  maximum  of  16  levels  in 
practical  use  of  our  scheme,  we  can  support  depth  4  computations  between  bootstrapping 
activities. 

The  basis  of  our  bootstrapping  approach  comes  from  a  new  approach  to  homomorphic  rounding. 
This  approach  to  bootstrapping  is  described  in  detail  in  [18].  We  provide  a  high-level  overview 
of  this  operation  here,  simplified  for  our  restriction  to  power-of-2  rings. 

This  Bootstrap  operation  has  the  following  steps: 

1)  Round  the  ciphertext:  For  each  entry  v  for  residue  i,  we  output  round(v  *  q/qO,  where  the 
inner  expression  is  rational,  and  ’’round”  means  taking  the  nearest  integer.  Generally  q  =  21  is 
chosen  experimentally,  but  as  small  as  possible. 

2)  Convert  the  plaintext  modulus:  This  is  a  null  operation  under  our  simplifying  assumptions. 

3)  Lift  the  ciphertext  and  plaintext  moduli:  This  is  also  a  null  operation  under  our  simplifying 
assumptions. 

4)  Scale  the  ciphertext:  We  scale  up  the  ciphertext  by  a  Q/q'  factor  (rounding  to  nearest  integers 
in  the  power  basis),  and  embed  into  dimension  N  (new  ring  dimension)  as  well.  The  plaintext 
modulus  is  still  q'. 
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5)  Compute  the  homomorphic  trace:  The  following  steps  are  performed  iteratively  log2(N) 
times: 

a.  ’’Lift”  the  ciphertext  modulus  to  2Q,  which  has  the  effect  of  making  the  plaintext 
modulus  2q. 

b.  Apply  the  automorphism  from  [18],  with  appropriate  key  switching  to  put  the  result 
into  the  same  key  as  the  original  ciphertext  in  the  iteration. 

c.  Sum  the  original  and  resulting  ciphertexts. 

d.  Divide  the  ciphertexts  by  2. 

6)  Perform  a  homomorphic  rounding:  This  operation  is  described  in  full  detail  in  Appendix  B  of 
[18]. 

Note:  bootstrapping  is  not  a  reciprocal  of  modulus  reduction.  The  reciprocal  of  modulus 
reduction  is  modulus  switching  (to  a  larger  modulus). 

Bootstrapping  does  not  completely  reset  noise.  If  we  start  with  a  ciphertext  with  noise  level,  say 
nf,  every  modulus  reduction  operation  changes  the  noise  level  from  noise  n,  to  level  n,_i  where 
n;_i  >  n; .  We  bootstrap  when  the  noise  level  reaches  ni.  If  Boostrapping  takes  b  levels  to 
complete,  then  the  output  of  a  optimally  configured  bootstrapping  operation  will  always  be  nf- 
nb  >=  ni.  While  not  completely  resetting  noise  to  the  original  level,  it  always  resets  it  to  the  same 
level. 

3.3.2  SIPHER  parameter  selection  for  SHE/FHE 

The  selection  of  n  and  qi, .  .  . ,  qt  depends  heavily  on  the  plaintext  modulus  p,  the  depth  of 
computation  that  needs  to  be  supported,  and  the  desired  security  level.  We  capture  the  primary 
concerns  influencing  the  selection  of  a  ring  dimension  n  and  the  moduli  qi, .  .  .  ,  qt  at  a  high  level 
as  follows: 

•  The  necessary  ring  arithmetic  should  be  easily  supported  on  the  computation  substrate  - 
i.e.,  that  mod-q;  operations  (for  i  G  { 1,  .  .  .  ,  t})  require  few  clock  cycles. 

•  The  moduli  qi, .  .  .  ,  qt  are  sufficiently  large  to  enable  sufficient  noise  shrinkage  via 
modulus  reduction. 

•  The  ring  dimension  n  and  noise  parameters  are  sufficiently  large  so  the  scheme  provides 
adequate  security. 

•  The  ring  dimension  n  is  not  so  large  that  it  becomes  overly  time-consuming  and  memory 
intensive  to  manipulate  the  ciphertexts. 

•  The  plaintext  modulus  p  and  any  noise  added  to  the  ciphertext  during  encryption  is 
sufficiently  small  that  we  can  evaluate  reasonably  sized  circuits  with  correct  decryption. 

We  choose  to  add  discrete  Gaussian  noise  to  the  fresh  ciphertexts  where  r  =  6  represents  the 
selected  probability  distribution  parameter.  We  have  found  theoretically  that  the  smallest 
modulus  qi  needs  to  satisfy  the  expression  qq  >  4 pryjnw  in  order  to  ensure  successful 
decryption,  where  the  parameter  w  ~  6  represents  an  “assurance”  measure  for  correct  decryption 
(essentially,  the  probability  of  decryption  failure  is  bounded  by  the  probability  that  a  normally 
distributed  variable  is  more  than  w\[2n  standard  deviations  from  its  mean),  and  p  •  r  is  the 
Gaussian  parameter  of  the  noise  used  in  fresh  ciphertexts.  (Hence  r  is  the  Gaussian  parameter  of 
the  underlying  NTRU-like  problem.) 

After  selecting  qi,  we  select  the  remaining  q,  G  {q2, .  .  .  ,  qt}  such  that  qi  >  4p2rV  5w5,  which 
ensures  that  modulus  reduction  by  a  factor  of  qi  sufficiently  reduces  the  noise  after  a 
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ComposedEvalMult  operation.  For  implementation  simplicity,  we  set  qi  to  be  the  smallest 
feasible  solution  to  qi  >  4p2rV  5w5.  Consequently  all  qi  are  represented  by  log2(qt)  bits,  leading 
to  simpler  implementations.  Table  1  shows  how  many  bits  are  required  to  represent  qi,  .  .  .  ,  qt 
for  varying  ring  dimensions  for  p  =  2.  Note  that  all  qi,  .  .  . ,  qt  can  be  represented  in  less  than  64 
bits. 


Table  1:  Dependence  of  bit  lengths  of  moduli  qi9  as  a  function  of  ring  dimension  for  p  =  2. 


1  Ring  dimension  n 

512 

1024 

2048 

4096 

8192 

16384  | 

Bit  length  log2  (qi ) 

44 

45 

47 

48 

50 

51 

Following  [32]— [  35],  we  use  the  standard  “root  Hermite  factor”  8  as  the  primary  measure  of 
concrete  security  for  a  set  of  parameters.  The  most  recent  experimental  evidence  [32]  suggests 
that  8  =  1.007  would  require  roughly  240  core-years  on  recent  Intel  Xeon  processors  to  break. 
Using  the  estimates  from  [33],  [34],  we  found  that  in  order  to  achieve  a  security  level  8  for  a 
depth  of  computation  d  =  t  -  1  using  the  t  moduli  qi?  .  .  .  ,  qt,  we  need  to  ensure  that 
n  >  lg(qi  •  •  •  qt)/(4  lg(8)). 

Table  2  shows  how  8  varies  as  a  function  of  the  ring  dimension  and  depth  of  computation 
supported.  Based  on  our  analysis,  if  we  impose  the  requirement  that  8  <  1.007,  then  we  would 
need  to  use  ring  dimension  n  =  16324  to  support  depth  d  =  13  computations.  The  colors 
correspond  to  roughly  equivalent  security  level. 

Table  2:  Security  level  §,  as  a  function  of  depth  of  computation  supported  (columns)  and  ring 

dimension  (rows)  for  p  =  2 


We  will  show  in  our  results  section  later,  that  while  this  suffices  for  most  cases,  when 
Bootstrapping  is  used,  there  are  issues  associated  with  our  parameter  selection  that  limit  the 
security  of  our  FHE  operations  in  our  current  software  and  hardware  implementations  (due  to 
limitations  on  FPGA  circuit  modulus  bit  width). 

3.4  SIPHER  library  and  software  architecture  for  SHE  and  FHE 

We  implemented  our  scheme  in  the  Mathworks  Matlab  environment  and  used  the  Matlab  Coder 
toolkit  [36]  to  generate  an  ANSI  C  representation  of  our  implementation.  We  subsequently  hand- 
modified  our  auto-generated  ANSI  C  to  incorporate  the  pthreads  library  [37].  This  leveraged 
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efficient  mechanisms  for  light  weight  parallelism.  We  compiled  this  ANSI  C  using  gee  to  run  as 
an  executable  in  a  Linux  environment.  We  believe  that  additional  perfonnance  improvements 
could  be  obtained  by  implementing  our  FHE  scheme  natively  in  C. 

We  chose  to  implement  our  scheme  in  Matlab  because  it  provides  an  interpreted  computation 
environment  for  rapid  prototyping  with  native  support  for  vector  and  matrix  manipulation  which 
simplifies  implementation  development.  We  found  the  Matlab  syntax  to  be  a  natural  fit  for 
writing  software  to  support  the  primitive  lattice  operations  needed  for  our  double-CRT  NTRU- 
based  SHE  design. 

We  wrote  our  Matlab  implementation  of  our  double-CRT  NTRU  SHE  scheme  using  the  Matlab 
fixed-point  toolbox.  The  Matlab  fixed-point  toolbox  also  provides  a  path  toward  generated  HDL 
implementations  of  our  design  that  can  be  deployed  for  practical  use  on  highly  parallel 
computing  hardware  such  as  FPGAs.  Part  of  our  vision  for  the  use  of  our  SHE  design  was  to 
develop  an  FPGA  implementation  of  FHE  [38],  [39]  as  discussed  later  in  this  report. 

3.5  Application  of  SIPHER  to  SHE  VOIP  and  resulting  architecture 

3.5.1  Design  goals  for  SHE  VOIP  teleconferencing 

We  identified  several  design  goals  and  metrics  of  performance  with  which  to  evaluate  and  reason 
over  our  end-to-end  encrypted  VoIP  teleconferencing  designs  and  implementations.  Our  primary 
high-level  design  goals  and  metrics  are: 

1)  Sound  Quality:  The  end-to-end  encrypted  VoIP  teleconferencing  capability  should  provide 
sound  quality  at  least  as  good  as  a  Public  Switched  Telephone  Network  (PSTN),  preferably 
with  full-duplex. 

2)  Latency:  The  end-to-end  encrypted  VoIP  teleconferencing  capability  should  provide  an  end- 
to-end  latency  ideally  of  less  than  100ms  for  trans-continental  VoIP  teleconference  session,  a 
generally  accepted  reasonable  latency  for  VoIP  technologies,  but  more  latency  is  acceptable 
for  inter-continental  operations. 

3)  Scalability:  The  end-to-end  encrypted  VoIP  teleconferencing  capability  should  be  able  to 
support  four  people  speaking  simultaneously  while  tens  of  participants  listen  without 
degradation  in  sound  quality  or  latency. 

4)  Secure:  The  end-to-end  encrypted  VoIP  teleconferencing  capability  should  provide  an 
encryption  work  factor  roughly  at  least  as  good  as  the  work  factor  for  AES- 128.  This  means 
that  the  VoIP  data,  when  encrypted,  should  require  at  least  as  much  computational  effort  to 
obtain  the  unencrypted  data  without  a  key  as  is  needed  for  AES- 128,  a  commonly  used  point- 
to-point  secure  encryption  technology. 

5)  Resource  Efficient:  FHE  schemes  have  been  known  to  require  encrypted  data  which  is  much 
larger  than  the  original  source  data.  Early  schemes  provided  a  ciphertext  expansion  of  several 
orders  of  magnitude  larger  than  the  source  data.  The  end-to-end  encrypted  VoIP 
teleconferencing  capability  should  ideally  require  less  than  an  order  of  magnitude  ciphertext 
expansion. 

6)  Wide  Geographic  Area:  The  end-to-end  encrypted  VoIP  teleconferencing  capability  should 
operate  with  users  and  the  VoIP  mixing  server  over  a  wide  geographic  area,  ideally 
transcontinental  if  not  inter-continental  without  an  unacceptable  degradation  in  sound  quality 
or  latency. 

7)  Portable:  The  end-to-end  encrypted  VoIP  teleconferencing  capability  should  be  easily  ported 
to  other  client  and  server  types. 
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8)  Easily  Deployable:  The  end-to-end  encrypted  VoIP  teleconferencing  capability  should  be 
easy  to  deploy,  such  as  with  small  binaries. 

9)  Usable:  The  end-to-end  encrypted  VoIP  teleconferencing  capability  should  be  intuitive  and 
easy  to  use. 

10)  Extensible:  The  end-to-end  encrypted  VoIP  teleconferencing  capability  should  be  easy  to 
modify  to  add  additional  and  more  advanced  functionality  at  a  later  date. 

3.5.2  Architecture  for  VOIP 

Figure  5  shows  a  high  level  example  illustrative  application  of  this  privacy-preserving  VoIP 
teleconferencing  technology  with  end-to-end  encryption.  Each  of  the  client’s  samples  users’ 
voice  data,  encodes  it,  encrypts  it  and  sends  the  result  to  the  VoIP  mixer.  The  mixer  sends  a 
result  back  which  is  then  decrypted,  decoded  and  played  back  to  the  clients’  users.  Any 
encryption  system  could  be  used  that  supports  an  additive  homomorphism  which  could  be 
implemented  in  a  practical  manner.  A  representational  scheme  that  supports  additive 
homomorphisms  is  NTRU  which  can  be  made  both  Somewhat  Homomorphic  (SHE)  and  Fully 
Homomorphic  (FHE)  in  addition  to  additive  homomorphic. 

Our  approach  uses  a  shared  secret  key,  but  more  general  designs  are  possible  that  generalize 
beyond  this  initial  shared  secret  key  design.  Input  voice  streams  from  clients  are  sampled  and 
homomorphically  encrypted  using  a  client’s  public  key.  The  encrypted  voice  samples  are  sent  to 
an  FHE-enabled  VoIP  server  that  does  not  have  access  to  encryption  keys.  The  VoIP  server 
combines  and  balances  the  encrypted  audio  feeds.  The  combined  output  is  then  forwarded  to  the 
client  handsets,  where  it  is  decrypted  and  played  back  for  the  user.  Our  HE-based  solution 
processes  streaming  audio  at  10  kBytes/s  per  voice. 

The  output  of  the  processing  is  sent  to  the  client,  where  it  is  decrypted  using  the  clients  private 
key.  No  keys  are  stored  on  the  teleconference  server,  so  privacy  is  preserved  even  if  an  adversary 
views  all  communication  links  and  operations  on  the  server.  No  trust  of  the  communication  links 
or  teleconference  server  is  required  to  provide  privacy.  The  level  of  security  provided  in  the 
current  prototype  is  roughly  at  the  level  of  AES-128,  but  parallels  between  the  security  levels  of 
the  encryption  scheme  and  other  current  standards  are  not  exact.  We  can  increase  the  security  of 
our  teleconference  capability  to  be  arbitrarily  higher  at  the  expense  of  voice  quality  by 
decreasing  sampling  rate  and  dynamic  range. 


Approved  for  Public  Release;  Distribution  Unlimited. 
15 


Figure  6  shows  how  the  clients  support  data  flows  internally.  In  the  top  of  the  diagram,  data  from 
the  microphone  is  sampled  and  fed  to  the  encoder,  encrypted  using  an  additive  homomorphic 
encryption  scheme  and  sent  to  the  mixer.  As  seen  in  the  bottom  of  the  figure,  the  result  returned 
from  the  mixer  is  decrypted,  decoded  and  played  back  over  a  speaker. 

Figure  7  shows  how  the  VoIP  mixer  takes  encrypted  input  from  various  clients  and  returns  a 
common  output.  For  a  representational  VoIP  system  with  clients  (ci,  C2,  C3,  .  .  .  ,  cm),  a  client  Cj 
would  want  (C1+C2+.  .  ,+Ci-i+Ci+i+.  .  .+cm).  This  summation  can  be  performed  in  a  tree  fashion  as 
illustrated  in  Figure  7.  For  our  representational  NTRU  scheme,  the  ciphertexts  are  vectorized  in 
blocks  of  m,  and  all  additions  are  performed  modulo  some  large  integer  q  pre-specified  by  the 
key  generator. 

Our  encoder/decoder  is  additive  so  that  we  can  rely  on  an  additive  homomorphism  such  as  the 
EvalAdd  operation  to  mix  VoIP  signals.  Because  we  require  only  an  efficient  secure  EvalAdd 
operation  to  support  encrypted  VoIP  mixing,  our  design  builds  on  the  recent  efficient  FHE 
design  and  implementation  discussed  in  [  3 1].  We  simplified  this  prior  work  such  that  we  remove 
the  ability  to  support  EvalMult  operations.  As  such,  because  we  only  need  to  support  much 
smaller  circuits,  we  do  not  need  the  parallelism  capabilities  as  discussed  in  [3 1]  for  our  VoIP 
application  and  integration  with  the  existing  Mumble/Murmur  open-source  VoIP  systems.  We 
also  use  much  smaller  parameters  than  the  designs  advocated  in  [31]  because  we  require  much 
more  greatly  reduced  functionality.  Thus,  the  basis  of  our  encryption  approach  is  a  special 
limited  version  of  FHE  called  Additive  Homomorphic  Encryption  which  allows  an  untrusted 
computation  host  to  compute  the  encrypted  sum  of  encrypted  integers. 
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Figure  6:  High-level  data  client  internal  data  processing 


Client  Vocoder 

We  have  a  developed  a  vocoder  technology  which  takes  voice  samples  from  a  client  and  encodes 
the  voice  samples  as  vectors  of  integers.  This  vocoder  is  linear  so  that  it  can  be  used,  for 
example,  with  an  additive  homomorphic  encryption  scheme  to  provide  an  encrypted  VoIP 
teleconferencing  capability.  In  this  example,  the  encoded  voice  samples  are  encrypted  using  the 
additive  homomorphic  encryption  scheme.  These  operations  are  performed  on  multiple  clients. 
The  resulting  ciphertexts  are  sent  to  a  VoIP  mixer  which  queues  and  adds  the  ciphertexts  from 
the  clients.  The  resulting  added  ciphertext  can  be  sent  back  to  the  clients.  When  decrypted  with 
the  additive  homomorphic  decryption  scheme,  decoded  using  our  decoding  scheme  and  played 
back  to  the  clients,  the  resulting  audio  is  a  mixing  of  the  audio  from  the  clients. 
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Figure  7:  Encrypted  VoIP  Server  mixing  for  three  clients 


Our  encoding  goal  is  to  convert  a  length-m  data  frame  of  y-bit  VoIP  samples  into  a  length-  n 
frame  of  integers  with  the  property  that  Encode(inputi)  +  Encode(input2)  =  Encode(inputi  + 
input2).  As  seen  in  the  left  hand  side  of  Figure  8,  we  split  the  length  m  sample  input  into  multiple 
blocks  of  n  =  2(floor(log2(m)))-length  vectors  and  a  single  mod(m  -  n)-length  vector  if  mod(m  -  n)  > 
0.  The  first  step  is  to  shift  the  samples  so  they  are  centered  around  0,  mod  2y.  For  the  zth  block  of 
samples,  we  multiply  the  integers  in  this  block  by  2(y  z  1  \  We  also  pad  the  m  -  n  block  of 
samples  with  2n  -  m  zeros  so  this  vector  is  n  samples  long.  As  seen  on  the  right  hand  side  of 
Figure  8,  we  sum  these  vectors.  These  operations  are  all  highly  efficient  as  they  only  involve 
splitting  vectors,  multiplication  be  two  and  bitwise  concatenation,  which  are  all  extremely 
efficient  to  implement.  This  result  is  the  encoded  vector  and  has  the  desired  Encode(  )  property 
above.  This  encoded  data  is  subsequently  used  for  encryption. 
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Figure  8:  Encrypted  VoIP  encoding 


Figure  9  shows  our  decoding  process.  On  the  right  hand  side  of  this  figure  we  take  the  input 
vector.  We  make  copies  of  this  block  and  perform  an  integer  division  by  2(y +2  * z_1)  for  the  zth 
block.  We  then  concatenate  these  vectors  and  return  the  result.  Like  for  the  encoding  operation, 
these  operations  are  all  highly  efficient  as  they  only  involve  splitting  vectors,  multiplication  be 
two  and  bitwise  concatenation,  which  are  all  extremely  efficient  to  implement. 


3.5.3  Homomorphic  encryption  and  Key  Generation  for  SHE  VOIP 

In  this  subsection  we  describe  the  additive  homomorphic  cryptosystem  we  use  to  construct  the 
end-to-end  encrypted  VoIP  capability  built  on  [3 1].  This  cryptosystem  is  very  similar  to  the 
NTRU  system  [19],  [40],  though  it  was  not  until  recently  that  its  homomorphic  properties  were 
noticed  independently  by  L'opez-Alt  et  al.  [41]  and  Gentry  et  al.  [42].  A  more  general  version  of 
this  cryptosystem  was  discussed  in  [3 1],  but  we  discuss  here  a  more  limited  version  of  the 
cryptosystem  which  is  simplified  for  more  efficient  end-to-end  VoIP  encryption. 

The  discussion  of  this  simplified  cryptosystem  has  a  high  degree  of  overlap  with  the  more 
general  cryptosystem.  Our  simplifications  reside  primarily  in  the  encryption  and  decryption 
operations,  but  we  include  the  full  key  generation  and  evaluation  addition  operations  which  are 
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also  modified,  but  to  a  lesser  extent,  for  the  sake  of  completeness.  The  modifications  are 
primarily  in  the  avoidance  of  any  ciphertext  decomposition  to  parallelize  operations  when  the 
ciphertext  modulus  q  >  264.  We  have  found  that  we  can  parameterize  the  cryptosystem  to  support 
the  vector  addition  of  adequately  large  plaintext  vectors  such  that  requiring  a  larger  ciphertext 
modulus  is  not  needed.  As  such,  because  we  can  limit  ourselves  to  64-bit  operations,  our 
simplified  cryptosystem  can  be  implemented  to  run  highly  efficiently  on  native  64-  and  32-  bit 
processors  without  the  parallelism  advances  obtained  in  [3 1]  for  more  efficient  more  general 
computations. 


Output  vector 
of  m  y-bit 
s  mples. 


1 

3 

5 

2 
5 
7 
2 
5 
2 
3 

7 


Cone  ten  te 
nd  remove  0 
p  dding. 


4 

4 


Figure  9:  Encrypted  VoIP  decoding 


The  basic  design  of  the  cryptosystem  was  detailed  in  section  3.3.1.  For  the  cryptosystem  the 
message  space  is  Rp  for  some  integer  p  >  2.  We  use  a  mod-q  Chinese  Remainder  Transform 
(CRT)  representation  of  elements  to  provide  fast  addition.  The  basic  operations  of  the  scheme 
are  as  the  same  as  section  3.3.1,  with  the  exception  that  the  Homomorphic  Operations  EvalMult 
and  Bootstrapping  are  not  used. 

3.5.4  Engineering  tradeoffs  in  parameterizing  SIPHER  for  SHE  VOIP 

We  need  to  choose  parameters  for  both  the  vocoder  and  the  cryptosystem  so  that: 

•  VoIP  signal  data  is  encoded  into  VoIP  plaintext. 

•  The  VoIP  plaintext  can  be  securely  encrypted  into  VoIP  ciphertext. 
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•  The  summation  of  multiple  VoIP  ciphertexts  can  be  successfully  decrypted  back  into 
VoIP  plaintext. 

•  The  output  VoIP  plaintext  can  be  decoded  into  an  undistorted  VoIP  signal. 

•  All  these  operations  need  to  be  run  efficiently  on  commodity  hardware,  such  as  64-  and 
32-bit  ARM  and  x86  processors. 

Generally,  these  concerns  mean  that: 

•  The  bitwidth  of  the  VoIP  data  P  needs  to  be  sufficiently  large  so  that  given  a  VoIP 
integer  signal  vectors  from  y  speakers  Vi,  V2, .  .  . ,  vy,  we  are  guaranteed  that  Vi  +  V2  +  •  • 

•  +  vy  =  (vi  +  V2  +  •  •  •  +  vy)  mod  P. 

•  The  number  of  layers  in  the  encodings  (and  hence  the  ring  dimension  n)  and  the  plaintext 
modulus  p  =  2X  need  to  be  sufficiently  large  with  respect  to  P  so  that  for  the  encodings  z\, 
Z2, .  .  .  ,  zy  where  z;  =  encode(v;),  we  have  z\  +  Z2  +  ■  •  ■  +  zy  =  (zi  +  Z2  +  •  •  •  +  zy)  mod  p. 

•  The  ciphertext  modulus  needs  to  be  sufficiently  small  that  we  can  support  computations 
on  the  ciphertext  efficiently.  For  modern  smart  phones  this  means  that  the  ciphertext 
modulus  is  at  most  264,  so  we  can  use  native  64-bit  computations. 

•  The  selection  of  parameters  needs  to  provide  a  non-trivial  root  Hennite  factor  to  provide 
security  guarantees. 

The  selection  of  the  ring  dimension  n  and  ciphertext  modulus  q  parameters  depends  heavily  on 
the  desired  security  level  and  the  plaintext  modulus  p.  The  plaintext  modulus  p  depends  on  the 
VoIP  data  modulus  P,  the  number  of  VoIP  streams  that  need  to  be  mixed  without  distortion  y  and 
the  VoIP  data  bit  width  P.  The  selection  of  a  ring  dimension  n  and  the  modulus  q  follow  the 
same  procedure  described  previously  in  section  3.3.2. 

3.5.5  Integration  of  SIPHER  library  with  a  VoIP  teleconferencing  framework 

We  evaluated  our  end-to-end  encrypted  VoIP  capability  by  implementing  our  vocoder  and 
homomorphic  encryption  library  and  then  integrating  them  with  an  existing  open-source  VoIP 
teleconferencing  capability.  This  activity  resulted  in  an  end-to-end  encrypted  VoIP  client  for 
teleconferencing  clients  running  in  an  Apple  iOS  environment  composed  of  a)  the  open-source 
Mumble  VoIP  client  modified  integrated  with  b)  a  custom  linear  codec  of  our  design  written  in 
ANSI  C  and  c)  an  FHE  encryption  library  ported  from  Matlab  to  ANSI  C.  We  also  wrote  and 
deployed  the  VoIP  server  capability  running  on  Linux  computing  devices  to  perfonn  the 
homomorphic  mixing  operation.  We  describe  the  implementation  of  this  capability  in  this 
section. 

Codec  and  Homomorphic  Encryption  implementation 

As  with  the  cryptosystem  design,  our  implementation  used  for  an  additive  homomorphic 
encryption  library  is  a  customization  of  the  design  introduced  in  [31].  We  implemented  our 
scheme  in  the  Mathworks  Matlab  environment  and  used  the  Matlab  Coder  toolkit  [36]  to 
generate  an  ANSI  C  library  of  our  implementation.  We  believe  that  additional  performance 
improvements  could  be  obtained  by  implementing  our  HE  scheme  natively  in  C. 

As  mentioned  in  section  3.4,  we  chose  to  implement  our  scheme  in  Matlab  using  the  Matlab 
fixed-point  toolbox.  We  implemented  the  vocoder  capability  in  native  ANSI  C.  We  compiled 
this  capability  using  the  gcc  tool  to  create  a  vocoder  library  which  we  then  integrated  with  the 
homomorphic  encryption  library  and  a  VoIP  teleconferencing  substrate. 
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VoIP  Teleconferencing  substrate 

Rather  than  construct  a  VoIP  capability  from  whole  cloth,  we  decided  to  construct  an  end-to-end 
encrypted  VoIP  teleconferencing  capability  by  integrating  our  additive  homomorphic  encryption 
library  and  our  vocoder  library  with  an  existing  open-source  VoIP  teleconferencing  library.  We 
selected  the  Mumble  VoIP  library  (http://mumble.sourceforge.net)  for  this  integration  because 
the  Mumble  is  mature,  and  offers  high  sound  quality  and  runs  on  a  variety  of  platforms. 

We  decided  to  implement  our  end-to-end  encrypted  VoIP  teleconferencing  capability  for  iOS 
clients  because  the  native  iOS  development  environment  uses  Objective  C,  a  dialect  of  ANSI  C. 
However,  even  though  we  only  developed  iOS  clients,  there  is  no  reason  our  client  library  could 
not  be  integrated  in  other  environments  such  as  for  Android,  Windows,  Mac  or  Blackberry 
clients. 

By  integrating  with  the  Mumble  library,  our  end-to-end  encrypted  VoIP  library  has  the  same 
usage  and  deployment  models  as  the  standard  Mumble  capability.  Notably,  Mumble  clients 
present  the  user  a  simple,  easy  to  use,  graphical  user  interface  that  can  be  easily  understood  with 
minimal  training.  An  image  of  the  modified  client  running  on  an  iPod  Touch  can  be  seen  in 
Figure  10  where  the  client  is  running  in  push-to-talk  mode.  This  client  is  indistinguishable  from 
the  standard  iOS  Mumble  client.  The  Mumble  software  can  also  be  deployed  through  an  app 
store  model,  or  as  binaries  which  can  be  loaded  onto  iOS  devices  through  XCode. 

We  integrated  the  iOS  capability  so  that  client  handsets  encrypt  their  audio  streams  using  the 
client’s  public  key.  The  proxy  server  computes  over  that  encrypted  data  without  decrypting  the 
data  or  sharing  keys.  The  output  of  the  processing  is  sent  to  the  client,  where  it  is  decrypted  using 
the  clients  private  key.  No  keys  are  stored  on  the  teleconference  server,  so  privacy  is  preserved 
even  if  an  adversary  views  all  communication  links  and  operations  on  the  server. 

This  integration  was  relatively  straight  forward  with  several  notable  exceptions  to  reduce  packet 
drops  and  improve  sound  quality: 

1)  The  client  application  generated  voice  packets  that  contained  480  samples  at  48  KHz,  or  10 
mSec  worth  of  sound.  The  sound  driver,  however,  generated  slightly  larger  packets.  As  a 
result,  the  period  of  the  sound  packets  was  slightly  larger  than  10  mSec,  and  every  so  often 
two  sound  packets  were  generated  back  to  back.  The  original  server  set  a  10  mSec  timer  and 
just  accepted  one  packet  every  10  mSec.  We  added  a  small  queue  at  the  server  so  we  did  not 
drop  packets  when  we  received  two  packets  in  a  row  very  quickly. 

2)  We  generated  new  frame  numbers  at  the  server  as  opposed  to  re-using  the  client  frame 
numbers.  The  clients  correlated  the  frame  numbers  with  time.  This  cut  down  on  the  time 
jitter  with  regard  to  frame  numbers. 

3)  The  encryption  and  decryption  operations  for  our  applications  were  processor  intensive,  and 
were  run  in  batches  of  several  audio  packets  at  once.  We  moved  the  encryption  and 
decryption  operations  to  a  low  priority  thread  and  had  the  higher  priority  thread  accept  and 
queue  new  audio  packets  (both  from  the  network,  and  from  the  microphone).  This  helped 
prevent  a  situation  where  we  audio  packets  were  dropped  because  we  were  too  busy 
decrypting  or  encrypting. 

The  goal  of  these  changes  was  to  reduce  the  drop  rate  of  packets  (an  issue  with  initial 
prototypes).  This  in  turn,  allowed  us  to  increase  the  audio  sampling  rate.  As  a  result,  we  achieved 
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sampling  10  bit  samples  at  a  rate  of  48  kHz.  This  configuration  provides  a  sound  quality 
substantially  better  than  PSTN  as  long  as  there  are  only  a  few  packet  drops. 


After  sampling  the  audio,  we  queue  and  encode  90  mSec  blocks  of  this  data  into  our  encoder. 
We  designed  the  system  to  accommodate  4  speakers,  resulting  in  the  homomorphic  mixer  to  add 
four  10-bit  integers  homomorphically,  resulting  in  a  12-bit  plaintext  without  the  encoding 
layering.  If  we  use  a  ring  dimension  n  =  1024,  we  are  required  to  use  2-layer  encoding  and  have 
a  resulting  plaintext  modulus  of  p  =  224  =  16777216.  This  encoding  and  encryption  results  in  a 
root  Hermite  factor  of  8  =  1.006  which  is  currently  believed  to  be  at  least  as  secure  as  AES-128. 
With  these  parameter  settings  we  observed  that  when  running  on  an  iPhone  5s,  the  encoding  and 
encryption  operation  took  a  mean  time  of  9.2  mSec  and  decryption  and  decoding  took  4.6  mSec. 
The  summation  on  the  VoIP  server  took  0.5  mSec.  Transport  of  encrypted  VoIP  traffic  from 
Cambridge  MA  to  the  Northern  Virginia  Amazon  AWS  servers  took  an  average  of  15  mSec. 
This  resulted  in  a  mean  latency  much  less  than  our  100  mSec  threshold  for  VoIP  traffic,  well 
within  the  bounds  of  reasonable,  both  in  theory  and  in  practice. 
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3.6  Application  of  SIPHER  to  FHE  keyword  search  (KWS)  of  encrypted 
documents 

We  developed  a  FHE  application  that  searches  for  encrypted  keywords  on  encrypted  text.  This 
method  relies  on  a  homomorphic  string  comparison  operation  that  is  repeated  for  all  keywords  in 
all  locations  in  an  encrypted  message.  We  imported  this  technology  into  a  mail-guard- type 
scenario  to  provide  outsourced  mail  filtering  based  on  keywords  of  interest  to  email  clients.  A 
sketch  of  use  of  this  technology  can  be  seen  in  Figure  11. _ 


Figure  11:  Architecture  of  the  FHE-based  Encrypted  keyword  search  E-Mail  Guard 


The  basis  of  our  approach  is  a  homomorphic  “Secure  Symbol  Matching”  method  that  relies  on  a 
set  of  symbols  (called  an  alphabet)  that  is  initially  mapped  to  an  integer  representation.  This 
approach  is  seen  in  Figure  12.  In  this  example,  ASCII  characters  in  a  text  file  maps  to  an  integer 
between  0  and  128.  We  can  compute  the  equality  of  two  encrypted  characters  by 
homomorphically  evaluating  their  difference  and  then  homomorphically  raising  the  difference  to 
the  (p-1)  power.  Using  a  related  technique  we  can  logically  AND  the  results  for  all  the  characters 
in  a  keyword  into  a  single  encrypted  true/false  bit.  For  a  k  bit  keyword,  we  repeat  the  k-bit  string 
comparison  multiple  times,  over  the  entire  encrypted  message,  each  time  shifting  the  starting 
point  in  the  encrypted  text  to  the  next  letter.  The  logical  AND  of  all  the  string  search  results  is 
done  in  a  binary  tree  structure  to  minimize  the  number  of  repeated  AND  operations  the  data  must 
undergo.  (ANDs  are  performed  with  ComposedEvalMult  operations,  so  a  /-level  SHE 
implementation  can  only  perform  t  repeated  ComposedEvalMult  on  any  one  piece  of  encrypted 
data  -  a  binary  tree  AND  lets  us  compute  2'  concatenated  ANDs  without  Bootstrapping.) 

Because  our  keyword  search  was  done  on  a  regular  computing  platform  (PC)  we  did  not  develop 
any  novel  encoding  techniques  as  we  did  for  VOIP.  Rather  we  focused  on  accelerating  the  KWS 
using  our  FPGA  hardware  accelerator.  Our  results  will  be  presented  in  section  4. 
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Figure  12:  String  comparison  operations  for  keyword  search 


3.7  Implementation  issues  for  parallelism  and  FHE  hardware 
acceleration 

We  have  covered  the  underlying  crypto  system  in 
detail,  but  now  we  will  focus  on  aspects  that  are 
important  when  implementing  accelerators  in 
hardware.  The  main  advantage  of  our  system  is  the 
use  of  the  “double-CRT”  representation  of  cipher 
texts  which  is  discussed  in  [43].  With  this  double- 
CRT  representation,  we  can  select  parameters  so  that 
cipher  texts  are  secure  when  represented  as  matrices 
of  64-bit  integers,  but  still  support  the  secure 
execution  of  programs  on  commodity  computing 
devices  without  expending  unnecessary 
computational  overhead  manipulating  large  multi¬ 
hundred-bit  or  even  multi-thousand-bit  integers. 
Additionally,  the  parallelism  implicit  in  this  data 
representation  is  easily  exploited  to  achieve 
efficiencies  during  implementation.  Figure  13  shows 
a  schematic  representation  of  how  ciphertext  is 
represented  in  this  double  CRT  format. 
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Figure  13:  Double  CRT  representation 
of  ciphertext  allows  for  cipher  text  to 
be  split  into  multiple  towers 


Approved  for  Public  Release;  Distribution  Unlimited. 
25 


Our  implementation  encrypts  a  plaintext  bit  into  a  two  dimensional  array  of  64  bit  unsigned 
integers1.  We  use  a  residue  number  system  implementation  to  represent  cipher  texts  as  T  sets  of 
length- A  integer  vectors.  A  ring  in  the  tower  entry  t  has  a  unique  modulus  q,  which  bounds  all 
entries  in  that  ring.  The  n  dimension  is  known  as  the  ring  size,  and  the  t  dimension  as  the  tower 
size.  This  representation  allows  us  to  operate  in  parallel  on  the  smaller  bit  width  mod-  qt  values 
instead  of  on  a  single  modulus  q  of  much  larger  bit  width,  where  q  =  qi  *  q2* ...  *  qr  for  pairwise 
co-prime  moduli  qt.. 

As  outlined  in  section  3.3  previously,  our  implementation  requires  only  a  few  elementary 
operations  to  be  implemented  on  the  FPGA  hardware  in  order  to  achieve  large  run  time  speedups 
over  conventional  CPU  implementations.  These  operations  are: 

■  RingAdd:  c„jt  =  (an>t  +  bn,t)  %  qt 

■  RingSub:  c„)t  =  (an,t  -  bn,t)  %  qt  , 

■  RingMul:  cn,t  =  (a„,t  *  bIU)  %  qt  . 

All  three  of  the  above  operations  can  be  parallelized  or  pipelined  over  both  n  and  t .  Also 
required  are  the 

■  CRT  and  Inverse  CRT,  which  are  implemented  as  a  Number  Theoretic  Transform  [44]  coupled 
with  a  pre  or  post  RingMul  with  an  appropriate  Twiddle  Vector. 

■  Round:  A  function  to  perform  modulo  rounding  using  different  tower  moduli  (detailed  below). 

The  two  repeated  key  ring  operations  EvalAdd  and  EvalMult  are  the  core  functions  in  FHE. 
When  our  parameters  are  chosen  such  that  a  single  plaintext  bit  is  encrypted,  the  resulting 
operations  on  the  encrypted  data  are  XOR  and  AND  respectively.  These  two  operations  allow  us 
to  implement  any  Boolean  operation  of  input  cipher  text2. 

As  mentioned  previously,  this  crypto  system,  like  many  FHE  systems  is  random  (noisy)  in 
nature.  Because  of  this,  only  a  limited  number  of  operations  can  be  perfonned  on  the  encrypted 
data  before  the  noise  dominates  and  decryption  is  no  longer  guaranteed.  EvalAdd  does  not  add 
noise  to  the  system,  so  an  unlimited  number  of  such  operations  are  allowed  to  be  chained 
together.  EvalMult  however  does  add  noise,  and  this  limits  the  number  of  such  operations  that 
can  be  chained  together.  The  double  CRT  representation  allows  a  very  straightforward 
implementation  that  controls  this  noise.  This  requires  the  use  of  both  key  switching  and  modulus 
reduction  whenever  an  EvalMult  is  perfonned.  The  combination  of  these  three  steps  is  known  as 
a  Composed  EvalMult.  The  property  of  CEM  is  that  for  a  pair  of  inputs  of  a  given  tower  size  t, 
the  output  is  a  cipher  text  of  tower  size  t-1.  Thus  for  an  initial  tower  size  of  T,  at  most  (T-l)  CEM 
operations  can  be  performed,  allowing  SHE.  Figure  14  shows  the  impact  CEM  has  on  system 
parallelism. 


1  While  the  actual  number  of  bits  is  determined  by  the  parameter  selection  of  the  cryptosystem,  we  select  64  as  our 
maximum  dimension  for  FPGA  implementation. 

2 

“  Any  arbitrary  Boolean  function  can  be  constructed  from  NAND  operations.  Since  NOT(a)  ==  XOR(a,  1),  and 
NAND(a,  b)  ==  NOT(AND(a,  b)),  the  two  Homomorphic  operations  are  a  sufficient  set. 
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Figure  14:  Composed  Eval  Mult  is  only  Operation  not  parallelized.  Each  CEM  has  a 
mod  reduction  step  that  removes  one  ciphertext  column 


The  dimensions  of  the  cryptosystem  are  determined  algorithmically,  and  are  a  function  of 
security  required,  and  the  number  of  CEM  operations  required  to  implement  the  desired 
application.  If  the  number  of  operations  required  by  the  application  exceeds  0(16),  then 
Bootstrapping  will  be  required  to  reset  the  noise  generated  by  the  cryptographic  operations. 
Bootstrapping  is  currently  on  the  order  of  10  CEM  equivalent  operations  for  reasonable  security 
parameters.  Bootstrapping  has  the  property  of  taking  a  cipher  text  of  tower  size  t,  and  generating 
a  new  ‘refreshed’  cipher  text  of  the  systems  original  tower  size  T.  Thus  an  unlimited  number  of 
operations  can  be  performed  on  the  data,  enabling  FHE. 

All  ring  operations  other  than  CEM  are  embarrassingly  parallel,  i.e.  all  data  needed  for  an 
operation  stays  within  a  particular  tower.  In  fact  in  the  CEM,  everything  except  the  Round 
operation  (discussed  later)  of  the  modReduce  is  also  embarrassingly  parallel.  This  parallelism 
can  be  exploited  in  our  implementation  both  on  multicores  and  on  the  FPGA  as  show  in  Figure 
15. 

Our  current  SHE  scheme  relies  on  operations  that  are  generally  inefficient  to  implement  on 
standard  CPU  architectures  (i.e.  modular  arithmetic  with  a  large  modulus).  For  convenience, 
most  of  the  previously  published  SHE  and  FHE  implementations  have  used  standard  tools  such 
as  the  GNU  Multiple  Precision  Arithmetic  Library  (GMP)  [45],  which  enable  researchers  to 
code  operations  using  very  large  integers.  This  limits  their  focus  to  operations  on  CPUs  and  does 
not  allow  them  to  take  advantage  of  specialized  parallel  computation  hardware  like  FPGAs 
which  provide  highly  cost-effective  parallelism.  Our  approach  to  developing  the  FPGA  code  for 
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implementing  efficient  ring  operations  is  to  develop  arithmetic  circuits  that  will  achieve  high 
throughput  by  using  parallelism  and  pipelining  on  the  FPGA  as  seen  in  Figure  15. 


Previous  scheme:  single  chain  of 
execution 


time 


Double  CRT  :  parallel  chains  of  execution 
with  synchronization  required  only  at  enc, 
and  mod  reduce  steps 
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Figure  15:  Ciphertext  towers  enable  enables  ring  operations  to  be  performed  on  each  tower 
separately.  Parallelism  is  acieved  through  concurrency  on  a  multicore,  or  pipelining  on  an 


FPGA 


Our  approach  was  to  develop  prototype  descriptions  in  Matlab  using  the  fixed  point  toolbox.  We 
then  re-implemented  these  primitives  in  Simulink  in  a  stream-oriented  style  that  allows 
conversion  to  VHDL.  The  results  of  the  two  implementations  are  then  directly  compared  to 
verify  correctness.  A  conversion  from  Simulink  to  VHDL  is  done  in  a  completely  automated 
fashion  using  Mathwork’s  HDL  coder.  This  tool  chain  provides  us  the  means  to  develop  our 
primitives,  including  testing  of  the  resulting  VHDL  on  FPGA  hardware,  much  faster  than 
traditional  methods.  Some  examples  of  efficiency  are: 

■  The  Matlab  and  Simulink  Models  are  driven  with  the  same  fixed  point  data  variables,  and 
generate  the  same  format  output,  simplifying  test  and  comparison 

■  The  bit  width  of  the  circuits  is  specified  at  compile  time  by  specifying  the  bit  width  of  the  input 
data.  The  sizing  of  intermediate  mathematical  operations  is  done  automatically  by  the  fixed 
point  toolbox.  Thus  many  of  the  same  models  can  be  used  for  8  through  64  bit  inputs. 

■  The  resulting  VHDL  is  vendor  independent.  This  allows  for  rapid  benchmarking  on  multiple 
architectures.  However,  hand  optimization  of  VHDL  may  be  required  for  optimum 
performance  in  order  to  take  advantage  of  vendor  specific  IP. 

■  Mathwork’s  HDL  verifier  allows  automatically  generated  FPGA  in  the  loop  testing  to  verily 
the  operation  of  the  resulting  VHDL  on  actual  hardware  very  early  on  in  the  program. 
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4  RESULTS  AND  DISCUSSION 

4.1  SIPHER  SHE  library  functions  experimental  timing  results 

We  ran  our  compiled  C  code  (auto-generated  from  Matlab)  SIPHER  library  implementation  on 
the  DARPA  Deathstar  64core  server  with  2.1GHz  Intel  Xeon  processors  and  1TB  of  RAM  in  a 
CentOS  environment.  Although  we  had  access  to  many  resources,  we  used  at  most  10  GB  of 
memory  and  20  cores  during  the  evaluation  of  our  software  implementation. 

We  collected  data  on  the  runtime  of  the  Encryption,  EvalAdd,  ComposedEvalMult,  and 
Decryption  operations  over  selections  of  depth  of  computation  supported  and  ring  dimension. 

We  ran  100  iterations  of  this  collection  procedure  for  each  combination  of  t  and  ring  dimension. 
We  used  different  randomly  selected  key  sets,  plaintexts  and  encryption  noise  with  every 
iteration  to  mitigate  minor  variations  in  performance  that  may  arise  due  to  these  experimental 
random  variables  on  every  iteration.  Tables  of  the  raw  mean  runtime  results  can  be  seen  in  Table 
14  through  Table  17  in  Appendix  A. 

We  collected  data  on  the  runtime  of  the  Encryption,  EvalAdd  and  ComposedEvalMult  operations 
for  settings  of  t  G  {2,  4,  6,  ...,  20}  and  for  ring  dimensions  n  G  {512,  1024,  2048,  4096,  8192, 
16384}.  We  collected  data  on  the  runtime  of  the  Decryption  operation  of  final  ciphertexts,  for 
computations  with  fresh  (input)  ciphertexts  with  ring  dimensions  n  G  {512,  1024,  2048,  4096, 
8192,  16384}  and  depth  of  computation  t-1  for  t  G  {2,  4,  6,  ...,  20}.  Note  that  due  to  ring 
switching,  the  decryption  runtime  is  dependent  only  on  the  dimension  of  the  final  ciphertext. 

This  is  a  function  of  the  initial  ciphertext  and  depth  of  computation.  We  did  not  collect  data  on 
the  runtime  of  the  Bootstrapping  operation  at  this  point  but  present  them  later  on  in  this  report. 

As  discussed  in  [40],  the  depth  of  computation  required  for  bootstrapping  is  logarithmic  in  the 
ring  dimension. 

Our  experimental  results  shows  that  run  times  grow  linearly  with  ring  dimension  n  and  the 
ciphertext  width  t  where  t  -  1  is  the  depth  of  computation  supported  before  bootstrapping  or 
decryption  could  still  be  performed  and  have  a  high  probability  of  recovering  a  correctly 
decrypted  ciphertext.  This  makes  intuitive  sense  because  as  we  double  either  the  ring  dimension 
or  the  ciphertext  width,  we  roughly  double  the  amount  of  computation  that  needs  to  be 
performed  with  every  Encryption,  EvalAdd  and  ComposedEvalMult  operation.  Similar  results 
hold  for  Decryption  (Table  17)  which  shows  a  linear  dependence  of  runtime  on  ring  dimension, 
but  under  the  assumption  that  decryption  occurs  after  t  -  1  ModReduction  operations,  including 
ModReduction  operations  bundled  in  ComposedEvalMult  operations.  Our  initial  results  show 
that  Bootstrapping  runtime  is  similarly  linear  with  respect  to  the  maximum  ring  dimension.  As 
compared  to  the  results  reported  in  [4],  [11],  [21],  our  FHE  software  implementation  provides 
order-of-magnitude  improvements  in  the  runtime  of  the  FHE  operations. 

4.2  SHE  VOIP  Teleconferencing  experimental  results 

We  experimentally  evaluated  the  performance  of  the  VoIP  service  by  deploying  our  encrypted 
VoIP  servers  in  each  of  the  Amazon  AWS  data  centers  across  the  world.  We  then  connected 
iPod  Touch  clients  to  each  of  the  servers  through  various  connection  types  in  the  metro  area  of  a 
United  States  city  in  southern  New  England.  These  connections  included  802. 1  In  wireless 
enterprise  gateway  connected  to  a  high-speed  enterprise  Internet  connection,  the  4G  LTE,  3G 
and  2G  connections  over  the  T-Mobile  commercial  wireless  service  and  an  AT&T  DSL 
connection  in  a  rural  area  outside  the  city. 
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We  measured  the  upload  and  download  throughput  of  the  connections,  the  drop  rate  of  VoIP 
packets  routed  through  the  various  server  locations  and  the  subjective  quality  of  the  VoIP 
teleconference  session  as  defined  by  the  experimenters.  The  upload  and  download  throughput 
was  measured  by  Ookla  throughput  measurement  app  [46]  on  the  client  devices.  VoIP  drop  rates 
were  measured  experimentally  by  modifying  the  VoIP  servers  to  measure  drop  rates.  Voice 
quality  was  measured  in  comparison  to  PSTN  voice  quality  where  “Excellent”  means  the  VoIP 
conversation  was  better  than  PSTN,  “Good”  means  the  VoIP  conversation  was  comparable 
PSTN,  “Poor”  means  the  VoIP  conversation  was  worse  than  PSTN  but  still  usable  for 
communication,  and  “Unusable”  means  the  connection  was  useless  for  communication. 

All  of  the  experiments  were  run  over  a  2  hour  period  on  a  weekday  evening  using  2  iPod  Touch 
clients  with  servers  deployed  on  the  Amazon  AWS  tl. micro  instances  [47].  Each  of  the  clients 
were  on  independent  connections  to  the  Internet  at  all  times,  so  there  was  low  likelihood  of  one 
client  contributing  substantially  to  congestion  for  the  other  client. 

Table  3  shows  the  upload  and  download  throughput  observed  by  each  of  the  clients  for  each  of 
the  connections.  Note  that  the  rural  DSL  service  provided  better  throughput  than  the  2G 
connection  and  better  download  throughput  than  the  3G  connection. 

Table  4  shows  the  packet  drop  rates  observed  at  each  of  the  servers  at  the  various  Amazon  AWS 
locations  for  the  various  client  connection  types.  Note  that  distance  between  the  client  and  server 
had  only  a  minor  impact  on  drop  rates,  while  the  connection  type  had  a  very  large  impact  on 
drop  rates.  This  implies  that  the  connection  could  be  a  bottleneck  for  the  VoIP  service. 

Table  5  shows  the  subjective  VoIP  teleconference  quality  measurements  observed  through  each 
of  the  servers  at  the  various  Amazon  AWS  locations  for  the  various  client  connection  types.  Note 
that  distance  between  the  client  and  server  had  almost  no  observed  impact  on  voice  quality, 
while  the  connection  type  had  a  very  large  impact  on  voice  quality. 

We  observed  that  all  of  the  various  connections  supported  acceptable  VoIP  teleconference 
capabilities  except  for  the  2G  connections.  Over  all  of  the  acceptable  connections,  the  lowest 
upload  or  download  throughput  observation  was  on  the  3G  download:  0.43Mb/sec  Because  the 
VoIP  download  and  upload  data  flows  are  symmetric,  this  implies  at  least  a  0.43Mb/sec  upload 
and  download  throughput  connection  is  required  to  support  VoIP  teleconferencing  using  our 
prototype. 

In  addition  to  our  tests  of  connection-server  pairings,  we  also  tested  the  scalability  of  the  number 
of  clients  that  could  be  supported  on  a  single  server.  For  this  experiment  we  connected  7  iPod 
Touch  and/or  iPhone  5  s  clients  at  various  connections  on  the  eastern  United  States  seaboard  to  a 
single  VoIP  server  in  the  Amazon  AWS  Northern  Virginia  data  center.  With  these  7  connections 
running  simultaneously  with  4  people  speaking  simultaneously  we  were  able  to  hold  as  good  as  a 
conversation  possible  with  4  people  speaking  simultaneously  and  no  voice  distortion  was 
observed  by  the  3  non-speaking  client  users. 


Table  3:  Experimentally  measured  data  throughput  in  Mb/s  for  connection  types 


Enterprise  802.1  In 

38.22 

36.53 

4G  LTE 

35.82 

17 

3G 

6.31 

0.43 

2G 

0.2 

0.16 

Rural  DSL 

2.55 

0.47 
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Table  4:  Packet  Drop  Rates  for  various  server  locations  and  client  internet  connection  types 


Server  Location  Client  Location 

Enterprise  4G  LTE 

802.1  In 

3G 

2G 

Rural 

DSL 

N.  Virginia 

S.  New  England 

0% 

10% 

10% 

66% 

33% 

Oregon 

S.  New  England 

0% 

2% 

3% 

71% 

35% 

N.  California 

S.  New  England 

0% 

7% 

8% 

67% 

34% 

Ireland 

S.  New  England 

0% 

7% 

7% 

73% 

38% 

Singapore 

S.  New  England 

5% 

2% 

2% 

68% 

39% 

Tokyo 

S.  New  England 

1% 

3% 

4% 

69% 

37% 

Sydney 

S.  New  England 

5% 

3% 

3% 

67% 

34% 

Sao  Paulo 

S.  New  England 

0.30% 

4% 

6% 

76% 

34% 

Table  5:  Teleconference  Quality  for  various  server  locations  and  client  internet  connection  types 

Server  Location 

Client  Location 

Enterpris 
e  802.1  In 

4G  LTE 

3G 

2G 

Rural  DSL 

N.  Virginia 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Oregon 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

N.  i 

California 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Ireland 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Singapore 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Tokyo 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Sydney 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Sao  Paulo 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

4.3  VOIP  SHE  discussion 

Up  to  now,  advances  in  secure  VoIP  technologies  have  focused  on  providing  security  for  data  in 
transit  [24],  [25]  among  other  general  security  challenges  such  as  DDoS  attacks  [48],  identity 
and  key  management  [49]  among  many  others  [50],  [51].  These  are  all  important  challenges  for 
secure  VoIP  teleconferencing  capabilities,  but  a  reliance  on  point-to-point  encryption  between 
participants  has  too  often  led  to  complicated  VoIP  teleconferencing  systems  and  protocols  [52], 
In  general,  the  complicated  layering  of  protection  mechanisms  is  often  difficult  to  execute  in 
practice,  leading  to  overly  complicated  systems  which  are  difficult  to  build  and  maintain. 
Further,  these  complicated  systems  are  often  difficult  to  perform  security  audits  on  [53]— [55], 
Although  all  of  the  partial  security  solutions  have  worked  very  well  in  isolation  and  have  served 
their  purposes  as  a  rule,  the  at  time  complicated  layering  of  these  protocols  has  resulted  in  the 
introduction  of  possible  security  holes  which  has  enabled  data  leakage. 

To  the  best  of  our  understanding,  there  have  been  no  VoIP  teleconferencing  technologies  which 
provide  end  to-  end  encryption.  Our  solution  seeks  to  provide  a  clean-slate  data  protection 
capability  that  is  also  compatible,  or  at  least  easily  integrated  with  existing  VoIP  protocols  and 
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architectures.  Because  we  provide  end-to-end  data  encryption,  our  solution  protects  data  against 
leakage  even  when  layered  with  existing  VoIP  protocols  for  signaling  and  transport.  Besides 
providing  security  against  data  leakage  due  to  compromised  servers,  end-to-end  encrypted  VoIP 
teleconferencing  has  the  possibility  for  greatly  simplifying  existing  VoIP  protocols,  resulting  in 
much  simpler  implementations  and  designs,  thus  resulting  in  more  efficient  VoIP 
implementations  that  are  easier  to  audit. 

The  basis  of  our  design  and  implemented  prototype  for  end-to-end  encrypted  VoIP 
teleconferencing  is  driven  by  and  builds  on  recent  breakthroughs  in  practical  Fully 
Homomorphic  Encryption  (FHE).  Recent  breakthroughs  in  Homomorphic  Encryption  have 
shown  that  it  is  theoretically  possible  to  securely  run  arbitrary  computations  over  encrypted  data 
without  decrypting  the  data  [1],  [2].  There  has  been  recent  work  on  designing  and  implementing 
variations  of  homomorphic  encryption  schemes  [4],  [11],  [15],  [17],  [20],  [21],  [41],  [56]— [58], 
These  implementations  have  become  increasingly  practical  with  published  results  on  both  the 
runtime  of  isolated  secure  computing  operations  for  some  implementation  [4],  [11],  [21]  and 
evaluations  of  composite  functions  like  AES  [17],  [20],  [58]. 

Current  approaches  to  design  FHE  schemes  rely  on  a  special,  highly  complex  and 
computationally  difficult  operation  called  bootstrapping  [18]  to  support  the  encrypted  execution 
of  arbitrary  functions.  As  such,  we  use  a  simplification  of  the  general  FHE  designs  called 
’’leveled”  homomorphic  encryption  or  Somewhat  Homomorphic  Encryption  (SHE)  the  supports 
limited  depth  computations,  such  as  vector  addition,  which  is  much  more  efficient  because  it 
does  not  require  the  use  of  bootstrapping. 

Besides  the  runtime  challenges  of  HE  designs,  there  are  serious  applications  issues  associated 
with  data  structures  and  representations  [17].  Furthennore,  it  has  not  been  well  explored  how  to 
convert  existing  data  structures  and  algorithms  into  forms  that  can  be  efficiently  executed  using 
FHE  technologies.  This  is  because  FHE  provides  a  very  different  computation  model  from 
existing  RAM  computing  devices  and  the  porting  of  known  data  structures  and  algorithms  (such 
as  for  VoIP  mixing)  is  non-trivial,  especially  for  highly  efficient  encrypted  execution  of  these 
algorithms  over  the  encrypted  input  data.  As  an  example  of  limitations,  early  uses  of  FHE  relied 
on  encrypting  individual  bits  in  ciphertext.  These  limitations,  in  addition  to  the  inherent 
computational  cost  of  secure  computing  using  known  FHE  schemes,  has  until  now  prevented  the 
practical  use  of  FHE.  Our  innovation  comes  from  designing  a  set  of  data  structures,  data 
encoding  method  (which  we  refer  to  as  a  vocoder)  and  a  homomorphic  mixing  operation  which 
supports  a  practical  implementation  of  end-to-end  encrypted  VoIP  teleconferencing. 

In  particular,  a  key  innovation  of  ours  is  to  go  beyond  simple  bit-per-ciphertext  encodings  by 
placing  entire  VoIP  data  frames  into  each  ciphertext.  These  codec  designs  are  in  some  sense 
much  simpler  than  existing  modern  codecs,  such  as  the  mu-law  encoders  [59]  which  are  much 
more  common  in  modem  VoIP  systems.  There  have  been  prior  known  approaches  to  Additive 
Homomorphic  Encryption,  such  as  Paillier  encryption  [60],  but  these  approaches  have  not  been 
practically  employed  to  support  encrypted  VoIP  mixing.  Further,  there  has  been  no  prior  work 
that  has  investigated  the  data  structures  required  to  support  end-to-end  encrypted  VoIP 
teleconferencing  with  homomorphic  mixing. 

There  have  been  few  other  approaches  to  providing  secure  VoIP  teleconferencing  that  approach 
to  providing  security  properties  such  as  end-to-end  encryption.  Most  relevant  is  the  work  in  [61] 
which  discusses  a  VoIP  teleconferencing  approach  based  on  Secure  Multi-Party  Computation 
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(SMC)  [62].  This  prior  SMC-based  approach  is  also  built  by  modifying  the  Mumble/Murmur 
software  and  our  team  received  implementation  advice  from  the  authors  of  [61].  Unlike  our  HE 
based  approach  which  requires  only  one  untrusted  server  for  end-to-end  encrypted  VoIP 
teleconferencing,  the  MPC-based  approach  in  [61]  requires  that  every  participant  in  the 
teleconference  have  at  least  one  trusted  server. 

4.4  FPGA  accelerator  --  FHE  Processing  Unit  (FHEPU) 

4.4.1  VHDL  implementations  of  fast  modulus  arithmetic  using  Simulink  HDL 
Generator 

Software  implementations  of  modulus  usually  use  some  form  of  trial  division  to  determine  the 
remainder  operation.  Implementing  modulus  integers  with  large  numbers  of  bits  in  an  efficient 
manner  requires  the  use  of  special  numerical  algorithms  that  have  been  developed,  such  as  the 
Montgomery  Reduction  [63],  and  the  Barrett  Reduction  [64].  These  algorithms  avoid  division  by 
q,  but  rather  scale  the  integers  so  that  many  of  the  divisions  can  be  performed  by  a  power  of  2, 
requiring  only  simple  bit  shifts.  Our  SHE  scheme  requires  circuits  for  fast  modulo  addition  and 
multiplication  (to  directly  implement  the  EvalAdd  and  EvalMult  mentioned  above).  In  addition, 
our  scheme  relies  heavily  on  the  Chinese  Remainder  Transform  (CRT)  [65],  which  can  be 
implemented  as  an  EvalMult  of  the  input  with  a  twiddle  table,  followed  by  an  FFT  [66]  that  uses 
modulo  integer  instead  of  complex  arithmetic  (also  known  as  a  Number  Theoretic  Transform  or 
NTT).  Our  implementation  of  this  FFT  uses  a  standard  radix  2  ‘Butterfly’  operations  which  uses 
one  addition,  one  subtraction  and  one  multiply,  all  modulo  the  residue  q.  Thus  to  implement  a 
CRT  we  need  to  implement  modulo  subtraction  as  well. 

Initially,  our  selection  of  lattice  based  SHE  led  to  looking  at  relatively  modest  sized  modulus,  on 
the  order  of  twenty  bits.  An  implementation  using  Montgomery  Reduction  based  arithmetic  was 
built  that  was  be  relatively  efficient,  requiring  hardware  multipliers  on  the  order  of  40  bits. 
However,  later  research  showed  that  for  any  reasonable  security  requirements  our  SHE  scheme 
would  need  0(64)  bits  for  our  modulus.  Our  implementation  of  Montgomery  arithmetic  in 
Simulink  required  us  to  double  our  bit  width  to  represent  intermediate  values  represented  in 
Montgomery  form.  We  found  that  there  is  an  intrinsic  limitation  of  128  bit  width  in  Simulink 
even  when  using  the  fixed  point  toolbox.  This  meant  that  we  could  not  compile  our  multipliers 
for  bit  widths  on  the  order  of  64  bits. 

Additionally,  our  early  arithmetic  models  were  all  designed  for  a  single  value  of  modulus  q  to  be 
used  for  all  operations.  During  the  development  of  our  SHE  scheme  we  found  that  it  was  more 
efficient  to  decompose  large  bit  width  numbers  into  a  set  of  smaller  related  moduli  using  the 
Double  CRT  representation.  This  resulted  in  far  more  efficient  implementations.  Thus  our 
circuits  would  need  to  operate  with  multiple  (but  not  unlimited)  values  of  q.  As  a  response  to 
these  new  requirements  we  eliminated  Montgomery  arithmetic  entirely  and  take  a  simpler 
approach  to  modulo  addition  and  subtraction. 

Figure  16  shows  the  Matlab  code  and  the  resulting  Simulink  block  for  performing  a  streaming 
EvalAdd.  This  circuit  requires  the  inputs  to  be  constrained  to  less  than  a  given  modulus  q  (which 
is  the  native  representation  for  our  FHE  scheme).  The  model  can  operate  on  one  pair  of  inputs 
every  clock  cycle.  For  simplicity,  the  model  shown  does  not  have  any  additional  pipeline 
registers  (which  are  modeled  as  unit  delays),  but  they  can  be  easily  added  to  the  model  in  order 
to  increase  the  maximum  clock  speed  of  the  resulting  VHDL,  at  a  cost  of  additional  latency.  In 
our  applications  we  expect  to  process  streams  of  input  on  the  order  of  several  thousand  entries, 
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so  this  additional  pipeline  latency  is  trivial.  Additionally,  our  circuits  have  an  index  into  a 
lookup  table  for  the  value  of  q  (i.e.  which  tower  index  we  are  operating  on),  since  the  bit  width 
of  this  index  is  much  smaller  than  that  of  q  (4  or  5  bits  vs.  64  bits). 


Figure  17  shows  the  Matlab  and  resulting  Simulink  block  for  modulo  subtraction.  The  same 
comments  about  pipelining  the  circuit  apply. 

Modulo  multiplication  is  a  much  more  complicated  operation  than  either  add  or  subtract,  even  if 
the  input  multiplier  and  multiplicand  are  bounded  by  q.  This  is  because  the  range  of  the  output  of 
the  latter  two  are  bounded  by  a  small  integer  multiple  of  q,  and  can  be  adjusted  within  the  range 
of  [0. . . q-1]  by  simple  comparisons  and  subtraction  of  q.  However,  for  multiplication  the  product 
is  approximately  twice  the  bit  width  of  q,  so  this  trick  cannot  be  used.  Furthermore,  we 
determined  in  our  earlier  work  that  the  VHDL  code  generated  by  Simulink  for  large 
multiplications  is  not  automatically  pipelined,  so  the  resulting  (large  bit  width)  multiplies 
severely  restrict  the  resulting  clock  rates  of  the  circuits.  To  address  these  two  constraints,  we 
adopted  a  recently  developed  interleaved  modular  multiplication  based  on  a  generalized  Barrett 
reduction  [67].  This  multiplier  has  the  following  properties: 
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■  Long  words  of  bit  length  L  can  be  represented  by  n  smaller  words  of  bit  length  S  (i.e.  four  16 
bit  words  to  represent  a  64  bit  modulus). 

■  The  multiplication  is  performed  in  n  stages,  where  each  stage  performs  one  modulo 
multiplication  that  is  L+S  bits  long.  The  stage  can  be  pipelined  to  perform  one  modulo 
multiply  per  clock  cycle. 

■  Each  stage  has  a  Barrett  modulus  performed  on  the  partial  product,  which  reduces  overall  bit 
growth  of  the  partial  products  to  L+S.  Each  stage  requires  3  multiplies,  and  all  divisions 
required  by  the  Barrett  algorithm  are  implemented  as  simple  bit  shifts. 

■  One  circuit  can  support  multiple  moduli  towers.  All  parameters  that  are  specific  to  a  given 
modulus  tower  can  be  stored  in  lookup  tables  and  indexed,  in  the  same  manner  as  q  is  for  our 
add  and  subtract  circuits. 

Figure  18  shows  the  structure  of  our  resulting  multiplier  for  S=16,  and  L  =  64  =  4*S,  resulting  in 
a  four  stage,  64x64  bit  multiplier.  This  model  will  produce  compile-able  VHDL  code,  i.e.  no 
single  operation  exceeds  128  bits  in  width.  The  red  box  in  the  figure  shows  the  model  for  a 
single  stage  in  the  pipeline.  All  stages  use  the  same  model.  This  implementation  uses  47  stages 
of  pipelining  in  a  single  stage  order  to  achieve  fast  clock  rates. 

Once  the  models  were  maximally  pipelined,  we  identified  several  large  (64  x  64  bit)  product 
blocks  within  our  RingMul  Barret  multiplication  implementation  as  being  the  slowest 
components,  and  re-implemented  them  as  an  expanded  multiplication  model  consisting  of  four 
parallel  32x32  bit  products,  and  a  pipelined  accumulation  of  partial  sums.  These  are  shown  in  the 
green  and  blue  boxes  in  the  Figure.  This  further  increased  the  achievable  clock  speeds.  We 
discovered  that  adding  additional  pipelines  of  length  four,  both  before  and  after  each  resulting 
smaller  product  block  further  allowed  the  Xilinx  optimizer  to  break  these  product  blocks  into 
multiple  DSP48E  multipliers  in  a  distributed  fashion.  This  allowed  the  RingMul  circuit  to 
perform  at  speeds  in  excess  of  350  MHz,  well  in  excess  of  our  target  200  MHz. 
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4  stages  shown 


47  stage 


Figure  18:The  structures  of  Simulink  HDL-ready,  four-stage  Barrett  64x64  bit  modulo  multiply 

primitive. 


The  resulting  Barrett  multi-word  circuit  supports  32  different  moduli  (i.e.  towers  up  to  32  in 
size).  Furthermore,  the  implementation  is  strictly  agnostic  to  the  particular  FPGA  technology 
used,  and  is  easily  tunable  in  the  generation  software  to  re-optimize  for  a  different  FPGA 
technology. 

4.4.2  VHDL  implementations  of  fast  forward  and  inverse  CRT  using  Simulink 
HDL  Generator 


The  workhorse  function  for  our  scheme  is  the  CRT  and  its  inverse,  both  of  which  rely  heavily  on 
modulo  arithmetic.  We  have  developed  a  Simulink  model  for  performing  a  fast  CRT,  based  on 
the  modulo  arithmetic  primitives  discussed  above.  We  implemented  the  Number  Theoretic 
Transform  using  one  of  the  standard  pipeline  decimation  in  frequency  FFT  architectures,  known 
as  the  Radix  2,  Multipath  Delay  Commutator  [68]. 

The  fundamental  structure  of  the  Simulink  model  that  performs  a  modulo  arithmetic  FFT  (NTT) 
is  identical  to  a  complex  version  that  computes  the  standard  FFT.  The  only  difference  is  in  the 
Simulink  subsystem  model  that  implements  the  radix  2  butterfly  (due  to  the  use  of  our  modular 
arithmetic  function).  In  fact,  the  flexibility  of  the  Simulink  approach  allowed  us  to  debug  the 
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model  using  complex  input  and  complex  butterflies,  and  then  use  the  same  exact  structure  for  the 
FFT  (NNT)  with  only  a  change  to  the  butterfly  sub  system  block. 


Figure  19:  Simulink  HDL-ready  streaming  pipelined  CRT  Structure 


Figure  19  shows  the  structure  of  this  pipelined  CRT.  The  design  trades  off  area  for  processing 
speed.  For  an  N  point  transform,  Nstages  =  log2(N)  radix  2  Butterflies  are  required  (though  the 
last  butterfly  does  not  require  multiplies).  Additionally,  3/2N-2  delay  elements  are  required  for 
the  shuffle  blocks. 

Note  that  this  implementation  results  in  the  fastest  possible  computation  rate  of  the  CRT  for  a 
given  FPGA  clock  speed,  with  one  pair  of  output  samples  being  generated  every  clock  cycle.  By 
concatenating  several  input  vectors  together  sequentially,  we  can  keep  the  pipeline  full  and,  once 
the  pipeline  has  fdled  up,  run  the  circuit  at  100%  efficiency. 

Note  that  the  only  difference  between  the  forward  and  inverse  CRT  is  whether  the  NTT  is  pre  or 
post  multiplied  with  a  special  “Twiddle  vector”  (different  from  the  NTT/FFT  twiddles).  We 
programmed  our  VHDL  wrapper  that  feeds  the  data  to  the  NTT  and  RingMul  components  so  that 
this  pre  or  post  multiplication  is  achieved  in  a  pipelined  manner  (to  be  illustrated  in  detail  later  in 
Figure  26). 

The  input  and  output  data  needs  to  be  presented  to  the  circuit  in  two  parallel  streams,  with  the 
top  stream  containing  the  first  half  of  the  input  vector  and  the  bottom  stream  containing  the 
second  half.  The  resulting  output  is  in  bit  reverse  order.  Rather  than  implement  this  in  Simulink 
we  incorporated  these  data  manipulations  into  the  VHDL  wrapper  around  the  NTT  portion  of  the 
CRT  as  shown  in  Figure  20.  These  wrappers  use  double  buffering  to  efficiently  keep  the  pipeline 
full. 

One  shortcoming  of  this  design  is  that  it  utilizes  a  large  amount  of  FPGA  area.  Thus  for  a  given 
bit  width  of  q  and  maximum  tower  size  T,  there  is  a  maximum  number  of  stages  that  will  fit  into 
a  given  FPGA.  Another  major  shortcoming  is  that  each  stage  has  its  own  “twiddle  memory”.  In 
practice,  every  stage’s  twiddle  memory  is  composed  of  exactly  the  same  even  entries  in  the 
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twiddle  memory  preceding  it  in  the  pipeline.  It  ends  up  that  we  were  not  able  to  fit  this  circuit  as 
is  into  our  candidate  FPGA  chip  for  high  security  applications  where  we  need  to  perform  CRT 
operations  on  vectors  of  up  to  2 14  in  length.  There  was  simply  not  enough  RAM  in  the  FPGA 
chosen. 

Our  solution  was  to  develop  a  custom  hand  coded  twiddle  RAM  to  replace  the  SIMULINK 
VHDL  twiddle  RAM.  Figure  20  depicts  the  final  implementation  of  the  CRT  module.  The  data 
stream  is  first  fed  through  the  “Top/Bottom”  block,  that  divides  the  single  data  stream  into  two 
data  streams,  and  then  through  a  set  of  13  stages.  Each  stage  has  two  64-bit  wide  input  streams 
and  two  64-bit  wide  output  streams.  The  inputs  of  each  stage,  except  the  first  stage  (labeled 
“Stage  4096”)  are  the  outputs  from  the  previous  stage.  Stages  4096  down  to  4  also  contain  a 
bypass  capability,  depicted  in  the  left  of  Figure  20.  When  a  particular  stage’s  bypass  is  enabled, 
that  stage’s  its  input  stream  is  passed  directly  to  its  output  stream,  unmodified.  This  allows  for 
variable-length  CRT  operations.  If  none  of  the  stages  are  bypassed,  a  16384-point  CRT  is 
performed.  If  the  first  stage  (labeled  “4096”)  is  bypassed,  an  8192  CRT  is  performed.  If  first 
two  stages  are  bypassed  (labeled  “4096”  and  “2048”),  a  4096  CRT  is  performed,  etc... 


In 

,  64 


64 

Out 


Figure  20:  Inside  the  NTT  showing  customizations  VHDL  wrapper  that  performs  required  I/O 
reordering,  for  sharing  twiddle  RAM  and  for  stage  bypass 

Each  stage  has  an  associated  “Twiddle”  Read  Only  Memory  (ROM)  table,  implemented  with 
FPGA  block  RAM.  The  first  stage  has  the  largest  table,  and  each  successive  stage’s  twiddle 
table  is  half  the  size  of  the  previous  stage’s  twiddle  table.  Unfortunately,  the  FPGA  does  not 
have  enough  block  RAM  resources  for  each  stage  to  have  its  own  table.  To  work  around  this, 
the  first  four  stages  share  a  table.  One  reason  this  is  possible  is  the  twiddle  tables  have  been 
designed  with  the  property  that  each  twiddle  table  contains  same  values  as  the  values  at  the  even 
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addresses  in  the  previous  table.  For  example,  the  values  at  addresses  0,  1,2,  and  3  in  table  2048 
are  the  same  as  the  values  at  address  0,  2,  4,  and  6  in  table  4096.  Similarly,  the  values  at 
addresses  0,  1,2,  and  3  in  table  1024  are  the  same  as  the  values  in  0,  2,  4,  and  6  in  table  2048, 
which  themselves  are  the  same  as  the  values  at  address  0,  4,  8,  and  12  in  table  4096.  Therefore, 
the  twiddle  table  for  the  first  stage  actually  contains  the  values  for  all  the  other  tables,  assuming 
it  is  addressed  appropriately.  The  reason  each  stage  needs  its  own  table,  however,  is  each  stage 
needs  to  access  its  table  simultaneously  in  order  for  the  CRT  module  to  achieve  the  desired  data 
throughput.  Fortunately,  the  FPGA  Block  Ram  primitives  are  Dual  Port,  which  means  that  it 
possible  to  simultaneously  read  from  two  independent  addresses.  By  clocking  the  dual  port 
block  RAM  at  twice  the  rate  (200  MFIz)  of  the  rest  of  the  CRT  logic  (100  MHz),  we  were  able  to 
construct  virtual  quad  port  block  RAM.  For  each  cycle  of  the  CRT  clock  (100  MHz),  two 
addresses  are  presented  to  each  port  of  the  large  twiddle  table,  one  on  each  cycle  of  its  faster 
clock  (200  MHz).  As  a  result,  the  large  twiddle  table  is  able  to  sustain  a  throughput  of  four 
independent  reads  per  clock  cycle,  and  saves  the  resources  required  by  the  twiddle  tables  for 
stages  2048,  1024,  and  5 12.  After  these  savings,  as  shown  in  Figure  21,  the  design  still  uses 
98%  of  the  available  FPGA  block  RAM  (BRAM)  resources. 
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Figure  21 :  Virtex  7  485T  FPGA  resource  utilization  for  the  FHE  accelerator 


4.4.3  VHDL  implementation  of  Ring  Round 

In  addition  to  our  implemented  ring  and  CRT  operations,  we  have  implemented  a  function  used 
in  our  Composed  Eval  Mult  (CEM)  called  Round.  The  CEM  function  is  implemented  in  our 
software  in  C,  executed  several  FPGA  primitives.  First,  a  RingMul  operation  performs  the 
multiply.  Next  a  key-switch  operation  is  performed  consisting  of  another  RingMul  of  the  product 
with  a  hint  variable  defined  by  the  cryptosystem.  Then,  a  modulus  reduction  operation  is 
perfonned  on  the  single  highest  tower  entry  of  the  result  which  consists  of  an  inverse  CRT  and 
this  Round  operation.  Since  the  Round  does  several  modulus  operations  not  otherwise  available, 
we  implemented  it  in  hardware. 
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Figure  22  shows  the  Simulink  Model  of  the  round  which  consists  of  a  modified  EvalMult 
operation  (using  a  modified  set  of  moduli  qt),  and  a  pair  of  operations  selected  by  the  range  of 
the  result  which  ensure  the  output  is  bounded  within  an  appropriate  range.  The  operations  are 
perfonned  in  a  pipelined  manner  as  well. 
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Figure  22:  Simulink  model  of  Round  function 
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The  result  of  the  rounding  operation  is  a  pair  of  new  ring  vectors  that  are  then  in  turn  applied  to 
each  remaining  tower  entry  to  reduce  the  noise  accumulated  by  the  initial  product.  These  vectors 
are  first  processed  with  a  series  of  RingAdds,  RingSubs  and  a  CRT  using  each  of  the 
corresponding  ring  moduli.  The  end  result  is  that  the  highest  tower  ring  is  eliminated  from  the 
cipher  text,  and  the  overall  noise  of  the  system  remains  at  a  usable  (i.e.  decryptable)  level. 

4.4.4  Further  optimizations 

Mathworks  determined  that  by  selecting  synchronous  vs.  asynchronous  reset  in  the  Simulink  to 
FIDL  generation  parameters,  the  resulting  VHDL  mapped  more  efficiently  into  the  registers  built 
into  the  DSP48E  blocks  on  the  Virtex  7  FPGA,  increasing  the  efficiency  of  the  resulting  mapped 
VHDL  by  eliminating  extra  routing  traces. 

The  previous  circuits  were  designed  to  run  at  a  minimum  speed  of  100  MHz.  We  determined  that 
adding  explicit  pipelining  stages  in  the  form  of  delay  lines  to  the  model  enabled  the  Xilinx  tools 
to  better  optimize  FPGA  mapping  during  place  and  route  pipelining  stages.  Specifically  pipelines 
were  added  between  arithmetic  operations  within  the  RingAdd  (4  stages),  RingSub  (3  stages), 
RingMul  (188  stages)  and  RingButterfly  (195  stages)  models.  Since  our  target  ring  size  can  be  as 
large  as  2  14 ,  and  all  the  towers  of  a  variable  are  processed  sequentially,  the  delay  incurred  from 
filling  the  pipeline  is  expected  to  be  minimal. 

Several  of  our  circuits  utilize  lookup  tables,  both  for  storing  the  moduli  q,  and  for  storing  various 
twiddle  table  entries  for  the  CRT  and  inverse.  Our  previous  direct  implementation  of  the  table 
lookup  using  the  Simulink  Lookup  function  block  maps  the  resulting  ROM  directly  into  gate 
circuitry.  This  can  increase  the  place  and  route  drastically  for  very  large  tables,  and  also  can 
result  in  less  efficient  circuits.  Mathworks  detennined  that  by  placing  an  additional  delay  line, 
with  a  “ResetType  =  none”  HDL  block  property  let  the  Xilinx  tools  map  the  table  to  block  ram  in 
the  FPGA,  which  is  a  more  efficient  utilization  of  resources  on  the  chip. 
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4.4.5  FPGA  hardware  selection 


Our  FPGA  selection  was  driven  by  the  need  for  a  large  number  of  hardware  multipliers  on  the 
chip.  Due  to  cost  constraints  we  wanted  to  use  a  commercial  off-the-shelf  FPGA  board  for  our 
experiments.  Our  selection  of  the  Virtex  7  VC707  evaluation  board  was  driven  by  the  following 
sizing  requirements.  Our  final  ring  size  of  2  14  requires  87%  of  the  DSP48  blocks  available  on 
this  board’s  Virtex  7  485T  chip.  Additionally,  we  require  on-board  DDR  memory  for  storage  of 
encrypted  variables,  and  high  speed  Ethernet  and  PCI  Express  (PCIe)  interfaces.  All  these  are 
present  on  the  VC707.  Note  in  this  document  both  PCI  and  PCIe  are  used  to  refer  to  the  PCI 
Express  interface  that  connects  the  FPGA  board  with  the  PC  motherboard  when  it  is  hosted  in  a 
PC. 

4.4.6  FHE  Accelerator  system  architecture 

The  design  goal  of  our  FPGA  system  was  to  be  able  to  operate  as  an  attached  processor  to 
accelerate  the  FHE  primitive  operations  in  way  that  allows  one  to  chain  together  several 
operations  in  order  to  minimize  the  overhead  due  to  data  transfer.  An  attached  processor  design 
was  developed  in  which  a  software  programmable  microcontroller  would  manage  I/O 
communications  with  the  host  via  Ethernet  or  PCI  memory  map,  manage  on  board  data  storage 
in  the  form  of  an  encrypted  register  file,  and  manage  data  transfer  to  and  from  the  FHE  primitive 
modules  in  as  efficient  manner  as  possible.  We  decided  to  use  the  Xilinx  Platfonn  Studio 
Microblaze  soft  core  processor  and  AXI4  interconnect  architecture  to  implement  the  attached 
FHE  processor.  We  found  that  this  supported  Ethernet  based  operation  well.  For  PCI  operation 
we  determined  that  the  Microblaze  was  not  required,  so  we  remove  the  soft  core  and  the  Ethernet 
controller  from  the  FPGA  for  those  builds. 

A  high  level  diagram  of  the  FHE  Accelerator  Architecture  is  shown  in  Figure  23.  At  the  top  of 
the  diagram  is  block  labeled  “Applications”.  This  represents  user  applications,  such  as  Matlab, 
that  use  the  FHE  accelerator  code.  User  applications  utilize  the  accelerator  by  communicating 
with  the  FHE  Kernel  Driver,  which  exposes  a  Linux  “character  device”.  The  FHE  character 
device  behaves  a  bit  like  a  file,  or  a  network  device.  The  user  application  sends  commands  and 
data  to  the  FHE  Kernel  driver  by  writing  packets  of  data  to  the  FHE  character  device,  and 
likewise  the  user  application  receives  responses  and  status  via  data  packets  from  the  FHE 
character  device.  The  packet  packed-based  communication  protocol  allows  the  user  application 
to  be  agnostic  to  whether  the  FHE  Accelerator  is  on  a  local  PCIe  bus,  on  separate  network- 
connected  device. 
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Figure  23:  FHE  hardware  accelerator  architecture 


The  core  FHE  Kernel  Driver  code  is  written  in  portable  “C”  code.  Therefore,  although  we  run 
this  code  in  the  Linux  kernel,  it  can  also  run  on  a  soft-core  Microblaze  processor  inside  the 
FPGA.  This  portable  driver  code  instantiates  a  set  registers,  in  memory,  which  store  the  FHE 
input  and  output  data,  as  well  as  intermediate  results.  When  the  code  runs  on  Microblaze,  these 
registers  exist  in  the  FPGA-connected  DDR  memory  and  when  the  code  runs  in  the  Linux 
Kernel,  these  register  exist  in  PC  memory.  Figure  23  depicts  the  PCI  hosted  case  where  the  code 
runs  in  the  Linux  Kernel.  For  the  case  where  the  FPGA  is  not  in  a  PCI  bus,  the  registers  exist  on 
the  FPGA  evaluation  board  in  DDR3  RAM,  and  the  FHE  Kernel  driver  code  is  split  between  the 
Linux  Kernel  and  the  Microblaze  on  the  FPGA.  Data  packets  (described  below)  are  sent  via 
Gigabit  Ethernet  rather  than  to  mapped  PCI  memory. 

Note  that,  in  Figure  23,  the  (AXI  Full  Interface)  arrow  between  the  Input  DMA  module  and  the 
PCIe  interface  originates  at  the  DMA  module,  even  though  the  data  flows  in  the  other  direction. 
This  is  because  the  DMA  modules  are  both  a  Master,  meaning  they  initiate  the  data  transfers. 

One  DMA  Master  (Input)  reads  from  PC  memory,  and  the  other  (Output)  writes  to  PC  memory. 
The  DMA  modules  also  have  an  AXI  slave  port,  for  control,  which  is  written  to  and  from  by  the 
PC  (via  the  PCIe  interface). 

Figure  24  shows  a  system  block  diagram  of  the  FPGA  system  for  both  the  Ethernet  and  PCI 
hosted  configurations.  The  Xilinx  platform  studio  enables  us  to  implement  our  FHE  primitives  as 
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streaming  co-processors  on  the  AXI  bus.  An  AXI4  lite  bus  is  used  to  set  control  parameters  of 
our  Ring  operation  circuits,  such  as  ring  size  and  tower  size. 


AXI4bus  ismatchedto  max 
DDR3  memory  bus  I/O 


Figure  24:  System  Block  Diagram  showing  major  components  and  the  AXI4  interconnect  for 

the  various  implementations  of  the  FHEPU. 


Figure  25  depicts  the  interior  of  the  FHE  Primitives  module  (purple)  in  Figure  24.  The  module  is 
fed  by  two  256-bit  wide  data  streams  in  the  AXI  clock  domain  (shown  on  top),  but  internally 
processes  two  64-bit  streams  in  a  different  clock  domain  called  the  “math  clock”  domain.  In  this 
design,  the  math  clock  runs  at  100  MHz,  and  the  AXI  clock  runs  at  125  MHz.  Separating  the 
FHE  math  clock  domain  from  the  PCIe  clock  domain  give  us  quite  a  bit  of  design  flexibility. 

The  frequency  of  the  AXI  clock  is  determined  by  the  speed  of  the  PCIe  interface.  The  math 
clock  frequency,  however,  can  be  set  arbitrarily  so  long  as  the  FHE  Core  logic  meets  timing  at 
that  frequency.  We  are  conservative  with  the  math  clock  frequency  due  to  the  fact  we  have 
some  internal  (CRT  twiddle)  memories  that  are  run  at  twice  the  math  clock  frequency  (described 
in  Figure  20).  As  the  FHE  Core  logic  eventually  is  rewritten  and  becomes  more  efficient,  we 
will  be  able  to  increase  the  rate  of  the  math  clock.  The  AXI  clock  may  also  be  doubled  if  we 
move  to  a  PCIe  Gen  3  architecture. 

When  running  with  Ethernet,  the  main  AXI4  interconnects  remain  a  256  bit  bus  connecting  the 
DDR3  ram  with  the  various  FHE  primitives.  In  this  mode,  the  I/O  rate  into  and  out  of  DDR3 
memory  limits  the  overall  processing  speed  of  the  system. 
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Figure  25:  Integration  of  FHE  primitives  with  the  AXI  stream  data  streams. 


Figure  26  shows  the  internal  structure  of  the  FFIE  core  and  how  the  inputs  and  output  streams  are 
routed  to  the  various  FHE  primitive  functions.  The  core  block  performs  one  of  several  operations 
on  the  data  -  Add,  Sub,  Mul,  CRT,  or  Round.  The  CRT  operation  can  perfonn  either  inverse  or 
forward  CRTs  of  ring  sizes  from  16  to  16384  (in  powers  of  2).  However,  the  CRT  operation  also 
uses  the  Mul  block.  For  forward  CRTs,  the  CRT  block  is  fed  with  input  In  0,  and  the  CRT 
module’s  output  is  then  multiplied  with  input  In  1  using  the  Mul  module.  Similarly,  for  inverse 
CRTs,  inputs  In  0  and  Ini  are  fed  into  the  Mul  module,  and  the  Mul  output  is  fed  into  the  CRT 
module.  This  architecture  conserves  FPGA  resources  for  a  couple  reasons.  First,  FPGA  logic 
resources  are  conserved  since  we  would  otherwise  need  to  instantiate  another  Mul  block  inside 
the  CRT  module.  Second,  since  the  Mul  inputs  for  CRT  operations  (CRT  Twiddle  Table)  arrive 
from  In  1  (via  DMA  from  DDR3  ram  or  PC  memory  depending  on  hosting),  they  do  not  need  to 
be  stored  locally  as  ROM  tables  in  the  FPGA.  This  reduces  the  amount  of  FPGA  block  memory 
used  by  the  design. 

Each  of  the  operations  in  purple  receives  their  input  data  pipelined  first  by  ring  elements,  then  by 
tower  indices.  Thus  all  input  and  output  for  a  complete  double  CRT  is  streamed  in  one 
operation.  The  CRT  modules  require  slightly  different  interfaces  that  change  the  order  of  the 
input  and  output  data  as  mentioned  previously. 
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Figure  26:  FHE  Core.  Ring  operation  pipelines  are  kept  fed  with  two  Input  data  streams,  and 
produce  one  or  two  output  data  streams,  all  under  direct  DMA  control  from  PCI  or  Microblaze. 


4.4.7  Communication  between  the  Application  and  Kernel  Driver. 

Ethernet-Hosted  Mode 

The  Xilinx  platform  studio  is  used  to  implement  a  Microblaze  soft  core  processor  when  the 
system  is  in  an  Ethernet  Hosted  configuration,.  The  system  architecture  is  based  on  the  demo 
hardware  self-test  example  that  is  provided  with  the  Xilinx  board.  The  software  architecture  is 
based  on  the  web-service  example  provided  with  the  Xilinx  Virtex  6  ML605  evaluation  kit, 
updated  with  the  Xilinx  SGMII  144  Ethernet  controller  (our  first  implementations,  as  we  later 
moved  to  the  Virtex  7  and  the  VC707  evaluation  board).  The  software  controlling  the  system  on 
the  Microblaze  is  written  in  C  code.  The  system  is  multithreaded  to  allow  the  use  of  Ethernet 
TCP/IP  socket  I/O.  A  network  thread  manages  socket  level  I/O  between  the  host  and  the  attached 
processor.  Another  thread  reads  the  incoming  messages  from  the  socket,  parses  the  commands 
received  and  dispatches  execution  to  various  subroutines. 

The  DDR3  ram  is  partitioned  into  a  set  of  register  data  structures,  as  well  as  a  set  of  internal 
registers  to  store  constants  used  in  our  encryption  schemes.  Each  register  can  hold  one  encrypted 
bit  in  the  form  of  2  dimensional  vectors  of  unsigned  long  longs  that  are  allocated  out  of  DDR 
ram.  One  dimension  (the  fastest  index)  is  the  ring  size  N  and  is  software  programmable.  The 
other  dimension,  the  tower  size,  varies  with  the  state  of  the  register.  Typically  registers  are 
loaded  into  the  FHE  coprocessor  with  a  fixed  starting  number  of  the  tower  elements  (up  to 
MAX  TOWER  SIZE  =  32  elements. 

The  registers  are  allocated  out  of  heap  in  the  DDR3  ram.  There  are  three  flavors  of  registers: 
Input,  Output  and  Scratch.  This  design  decision  was  made  in  order  to  allow  us  to  later  segregate 
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I/O  and  scratch  registers  into  different  memory  locations  if  that  were  to  increase  throughput.  The 
quantity  of  each  register  type  is  software  defined  at  compile  time  but  there  is  usually  a  small 
numbers  of  Input  and  Output  registers  and  as  many  Scratch  registers  as  will  utilize  all  the 
available  heap  space.  Control  structures  mark  the  current  tower  size  of  each  register,  and  if  the 
register  is  used  or  not.  Registers  are  allocated  so  they  are  aligned  to  32  byte  address  boundaries 
in  order  to  allow  the  AXI4  DMA  engines  to  move  the  register  data  into  and  out  of  the  FHE 
primitives.  This  format  allows  the  contents  of  an  entire  register  (all  used  towers)  to  be  streamed 
with  only  one  DMA  transfer. 

A  Linux  Driver  is  used  to  interface  with  the  user  Application  code.  It  translates  the  Application 
Level  text  interface  messages  to  a  binary  format  message  (both  to  be  described  later)  and  handles 
all  Ethernet  I/O  to  the  LPGA  board. 

PCI-Hosted  Mode 

The  Linux  Driver  for  the  PCI  Hosted  mode  is  written  to  emulate  the  buffer  I/O  of  the  Ethernet 
interface,  allowing  the  same  user  application  software  to  be  used  for  both  Ethernet  and  PCI 
operation. 

In  this  mode,  the  DDR3  is  not  used  at  all,  and  both  the  Microblaze  and  Ethernet  controllers  are 
removed  from  the  system  build  to  conserve  resources.  Registers  are  allocated  out  of  the  PCI 
Kernel  memory.  All  Microblaze  code  is  executed  in  Linux  Kernel  Space. 

Application  Level  Text  Interface 

The  communication  protocol  between  the  user  application  code  and  Linux  Driver  is  message 
based.  The  messages  are  in  ASCII.  Each  processor  instruction  is  then  parsed.  The  parsing  test 
starts  with  a  keyword  that  defines  the  rest  of  the  instruction  fonnat.  The  keywords  are  shown  in 
Table  6. 


Table  6:  Application  Level  Control  Protocol  keywords 


Keyword 

Function 

LOAD 

Transfer  the  contents  of  the  message  (ascii)  into  a  particular  Input  register. 

GET 

Request  the  contents  of  a  particular  output  register  to  be  loaded  into  a  message  buffer  and  sent  back  to 
the  host. 

STATUS 

Generates  a  short  report  on  the  FPGA  board  console  for  debugging  showing  the  contents  of  all  used 
registers,  a  listing  of  the  current  program  loaded. 

PROG 

Loads  a  sequence  of  operations  to  be  performed  on  the  register  data,  in  a  simple  assembly  language. 

RUN 

Starts  a  software  Finite  State  Machine  to  run  the  stored  program  to  completion. 

CRT,  ICRT, 
CEM 

A  single  command  that  will  LOAD  two  registers,  perform  a  forward  CRT,  inverse  CRT  or  Composed 
EvalMult  on  them  and  GET  the  resulting  output.  Used  for  accelerating  applications  that  only  require 
these  three  operations. 

RESET 

Resets  the  system  to  its  original  state. 

The  user  application  can  string  commands  together  to  program  the  LPGA  to  operate  on  several 
pieces  of  encrypted  data  in  the  form  of  an  assembly  language.  The  LPGA  accelerator’s  assembly 
language  has  the  syntax  shown  in  Table  7. 


Table  7:  Table  of  available  Opcodes  for  Application  program 


Opcode 

Example 

Description 

LOAD 

R1  =  LOAD(lnO) 

Moves  data  from  an  input  register  to  scratch  register,  all  active  tower 
elements  are  moved. 

STORE 

Out4  =  STORE(R3) 

Moves  data  from  a  scratch  register  to  output  register,  all  active  tower 
elements  are  moved. 

RADD 

R2  =  RADD(R3,  R4) 

Sets  up  DMAs  of  the  two  input  and  one  output  registers  to  the  RingAdd 
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|  circuit.  All  active  tower  elements  are  processed  1  one  large  data  flow.  j 

RSUB 

R2  =  RSUB(R3,  R4) 

Sane  as  RingAdd,  except  the  RingSub  circuit  is  the  target  of  the  DMAs. 

RMUL 

R2  =  RMUL(R3,  R4) 

Sane  as  RingMul,  except  the  RingSub  circuit  is  the  target  of  the  DMAs. 

CRT 

R3=  CRT(R1 ,  R2) 

Same  as  RingAdd,  except  the  input  and  output  registers  are  used  as 
endpoints  for  pairs  of  DMA  transfers,  each  moving  one  half  of  the  ring  data. 
Note  second  input  register  is  used  as  a  scratch  register  so  is  contents  are 
destroyed. 

ICRT 

R2  =  ICRT(R4,  R5) 

Same  as  CRT  except  an  inverse  CRT  circuit  is  used. 

EMULC 

R2  =  EMULC(R3,  R4) 

Executes  a  ComposedEvalMult,  in  software  which  in  turns  executes  several 
Ring  primitives  (see  below).  Note  that  output  register  is  one  tower  smaller 
than  the  input  registers. 

An  example  simple  program  in  now  given  in  Table  8.  The  program  first  moves  encrypted  data 
from  input  register  0,  to  scratch  register  0,  then  repeats  the  process  for  a  second  input  variable  to 
register  1 .  It  then  computes  a  RingAdd,  RingSub  and  RingMul  using  the  two  inputs,  and  storing 
the  result  in  scratch  registers  2,  3  and  4  respectively.  It  then  stores  those  three  results  in  output 
registers  0,  1  and  2  respectively. 


Table  8:  Sample  FHEPU  program 


R0  = 

LOAD (InO) 

R1  = 

LOAD (Ini) 

R2  = 

RADD (R0, Rl) 

R3  = 

RSUB (R0, Rl) 

R4  = 

RMUL (R0, Rl) 

OutO 

=  STORE (R2 ) 

Out  1 

=  STORE (R3 ) 

Out2 

=  STORE (R4 ) 

Typical  system  operation  would  be  for  the  user  to  execute  two  LOAD  commands  to  load  the 
contents  of  input  registers  0  and  1  with  encrypted  data  (the  encryption  being  done  on  the  secure 
host).  The  user  then  executes  a  RUN  command  to  allow  the  Homomorphic  operations  to  be  run 
on  the  unsecure  FPGA  processor.  Then  subsequent  calls  to  GET  commands  will  transfer  the 
resulting  encrypted  result  data  back  to  the  host.  Finally  decryption  would  be  done  on  the  secure 
host. 

4.4.8  Communication  between  the  Kernel  Driver  and  the  FPGA 

The  FHE  Kernel  driver  delegates  certain  primitive  operations  to  the  FPGA.  This  FPGA  control 
is  made  possible  by  a  set  of  AXI-LITE  register-based  interfaces,  which  exist  in  the  FPGA,  but 
are  memory-mapped  (made  accessible)  to  the  FHE  Kernel  Driver  (on  the  PC)  via  the  PCIe 
interface.  The  FPGA  Kernel  driver  configures  the  FHE  module  in  the  FPGA  for  a  particular 
operation  by  writing  to  its  AXI-LITE  interface.  Next,  the  input  and  output  DMA  modules  are 
configured  by  the  FHE  Kernel  Driver  via  their  AXI-LITE  interfaces.  At  this  point,  the  input 
DMA  module  fetches  input  data  directly  from  PC  memory  via  the  PCIe  interface  and  sends  the 
data  to  the  FHE  module.  The  FHE  module  processes  the  data,  and  sends  its  output  to  the  output 
DMA  module.  The  output  DMA  module  writes  the  data  directly  back  to  PC  memory  via  the 
PCIe  interface. 
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The  driver  communicates  with  the  FPGA  either  over  Ethernet  or  PCIe  by  sending  and  receiving 
packets.  Each  packet  starts  with  a  32-byte  header  that  contains  8  4-byte  fields  as  shown  in  Table 
9. 


Table  9:  Driver  Packet  header  format 


totalLength 

32-bits 

Contains  the  total  length  of  the  packet,  in  bytes 

cmd 

32-bits 

Defines  the  command  (packet  type) 

auxl 

32-bits 

Command-Specific  data 

aux2 

32-bits 

Command-Specific  data 

aux3 

32-bits 

Command-Specific  data 

aux4 

32-bits 

Command-Specific  data 

aux5 

32-bits 

Command-Specific  data 

aux6 

32-bits 

Command-Specific  data 

All  32-bit  numeric  fields  are  in  little  endian  byte  order.  The  first  two  header  fields  are  used  by 
every  packet,  while  the  next  6  fields  (auxl-aux6)  are  interpreted  differently  depending  on  the 
command.  Some  packets  also  contain  data  following  the  header.  If  a  packet  contains  more  data 
following  the  header,  its  totalLength  field  will  be  greater  than  the  header  size,  otherwise  it  will 
be  equal  to  the  header  size.  Packets  from  the  driver  back  to  the  user  application  start  with  the 
same  packet  header,  however  have  different  values  in  the  “cmd”  field.  The  “cmd”  field  may 
have  the  following  values  as  shown  in  Table  10. 


Table  10:  Driver  Packet  command  enumeration 


Command 

Enumer 

-ation 

Description 

LOAD 

0 

This  command  instructs  the  driver  to  load  a  register  with  a  set  of  values.  The  auxl  field 
contains  the  index  of  the  register  to  load,  and  the  aux2  field  contains  the  tower  index 
within  that  register.  The  number  of  values  to  load  is  determined  by  the  number  of  bytes 
left  in  the  packet.  The  remaining  packet  bytes  contain  a  set  of  64-bit  little  endian  encoded 
integers.  The  user  application  should  always  load  tower  index  0  first,  as  the  number  of 
elements  in  tower  index  0  determines  the  ring  size.  After  tower  index  0  of  a  register  is 
loaded,  the  driver  will  not  allow  other  tower  indices  to  be  loaded  unless  the  number  of 
element  to  be  loaded  is  equal  to  the  ring  size  (defined  by  the  number  of  elements  in  tower 
index  0).  The  ring  size  my  be  changed  by  re-loading  tower  index  0. 

PROG 

1 

The  driver  may  execute  a  set  of  operations  on  behalf  of  the  user  application.  These 
operations  are  defined  by  a  small  “program”  sent  to  the  driver.  Each  instruction  in  the 
program  is  encoded  by  a  32-bit  instruction,  and  this  command  is  used  to  load  the  program 
-  one  instruction  at  a  time.  The  auxl  field  contains  the  instruction  index  (where  it  resides 
in  the  program),  with  the  first  instruction  residing  at  index  1.  The  aux2  field  contains  the 
instruction  itself.  The  driver  keeps  track  of  how  many  instructions  were  written  so  it  knows 
how  many  instructions  to  execute.  An  existing  program  may  be  cleared  by  writing  a  new 
instruction  to  index  1. 

GET 

2 

This  command  causes  the  driver  to  send  back  the  contents  of  an  output  register,  (with  a 
respGET  packet,  described  below).  The  auxl  field  contains  the  output  register  number, 
and  the  aux2  field  contains  the  tower  number. 

RUN 

3 

This  command  causes  the  driver  to  execute  the  program,  set  by  one  or  more  PROG 
commands  (described  above). 

VERIFY 

4 

This  command  causes  the  driver  verify  the  contents  of  an  output  register.  The  auxl  field 
contains  the  output  register  index,  and  the  aux2  field  contains  the  tower  index.  The 
packet  data  (following  the  packet  header),  is  compared  to  the  register  contents  by  the 
driver  to  see  if  it  matches.  The  driver  will  then  send  back  a  respVerify  packet  back  to  the 
user  application  to  let  the  application  know  if  the  data  matched  or  not.  If  the  data  size  in 
this  packet  does  not  match  the  register’s  ring  size,  the  driver  outputs  an  error  message  to 
its  standard  output,  but  does  not  send  a  response  packet 

DUMP 

5 

This  is  a  debug  command  that  causes  the  driver  to  dump  the  contents  of  a  register  to  its 
standard  output.  The  aux  field  contains  the  register  index,  and  the  aux2  field  contains  the 
tower  index. 
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HINT 

6 

This  command  is  used  to  load  a  hint  register.  The  auxl  field  contains  the  hint  register 
index,  and  the  aux2  field  contains  the  tower  index. 

respGET 

8 

This  response,  sent  from  the  driver  back  to  the  user  application,  is  a  the  response  to  a 

GET  command.  Its  header  is  followed  by  the  data  in  the  register  requested  by  a  GET 
command  from  the  application.  The  auxl  field  is  set  to  the  register  index. 

respVERIFY 

9 

This  response,  sent  from  the  driver  to  the  user  application  to  inform  the  user  application 
the  result  of  a  VERIFY  command.  The  auxl  field  contains  0  if  the  register  contents 
matched  the  data  in  the  VERIFY  command,  and  1  if  the  data  did  not  match. 

The  user  application  my  send  the  driver  a  small  program,  using  the  PROG  and  RUN  commands, 
described  above.  Each  program  instruction  is  encoded  with  a  32-bit  little-endian  word,  using  the 
format  described  in  Table  11. 


Table  11:  Driver  Packet  instruction  encoding 


Bits 

Field  Name 

Description 

24-31 

Opcode 

Encodes  the  type  of  operation  to  be  performed.  See  Table 
below  for  a  description  of  the  available  operations. 

23-22 

Return  Value  Type 

Encodes  the  register  type  of  the  return  value. 

16-22 

Return  Value  Index 

Encodes  the  register  number  of  the  return  value. 

14-15 

Argument  2  Type 

Encodes  the  register  type  of  the  second  argument. 

8-13 

Argument  2  Index 

Encodes  the  register  number  of  the  second  argument. 

7-6 

Argument  1  Type 

Encodes  the  register  type  of  the  first  argument. 

Argument  1  Index 

Encodes  the  register  number  of  the  first  argument. 

There  are  four  possible  values  for  the  register  type  fields.  These  are  outlined  in  Table  12: 


Table  12:  Driver  Packet  register  encoding 


Register 

Type 

Enumer¬ 

ation 

Description 

None 

0 

This  encoding  is  used  if  an  argument  does  not  refer  to  a  register.  Its  interpretation 
would  depend  on  the  opcode. 

Input 

1 

Input  registers  are  filled  with  data  from  LOAD  commands  -  i.e.  with  data  from  user 
applications. 

Output 

2 

Output  registers  are  sent  to  the  user  application  as  responses  to  GET  commands. 
They  typically  contain  the  results  of  operations. 

3 

Scratch  registers  are  contain  the  inputs  and  output  of  FPGA  operations. 

The  opcodes  are  summarized  in  Table  13  below.  Note  that  some  encodings  have  been  omitted. 
This  is  because  some  opcodes  were  defined  but  never  implemented.  The  following  table  only 
contains  the  implemented  opcodes: 

Table  13:  Driver  Packet  opcode  encoding 

Opcode 

Enumer 

-ation 

Description 

LOAD 

0 

Move  the  content  of  an  input  register  to  a  scratch  register.  Argument  1  contains  the  input 
register  address,  and  the  return  value  contains  the  scratch  register  address. 

STORE 

1 

Move  the  contents  of  a  scratch  register  to  an  output  register.  Argument  1  contains  the 
scratch  register  address,  and  the  return  value  contains  output  register  address. 

RADD 

5 

Perform  the  ADD  operation  on  two  scratch  registers.  Arguments  1  and  2  contain  the  two 
input  scratch  register  addresses  for  the  operation,  and  the  return  value  contains  the 
address  of  the  output  register. 

RSUB 

6 

Perform  the  SUB  operation  on  two  scratch  registers.  Arguments  1  and  2  contain  the  two 
input  scratch  register  addresses  for  the  operation,  and  the  return  value  contains  the 
address  of  the  output  register. 

RMUL 

7 

Perform  the  MUL  operation  on  two  scratch  registers.  Arguments  1  and  2  contain  the  two 
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input  scratch  register  addresses  for  the  operation,  and  the  return  value  contains  the 
address  of  the  output  register. 

CRT 

8 

Perform  the  CRT  operation  on  a  scratch  register.  Arguments  1  contains  the  input  scratch 
register  addresses  for  the  operation,  and  the  return  value  contains  the  address  of  the 
output  register. 

ICRT 

9 

Perform  the  ICRT  operation  on  a  scratch  register.  Arguments  1  contains  the  input  scratch 
register  addresses  for  the  operation,  and  the  return  value  contains  the  address  of  the 
output  register. 

ROUND 

1 

10 

Perform  the  ROUND  operation  on  a  scratch  register.  Arguments  1  contains  the  input 
scratch  register  addresses  for  the  operation,  and  argument  2  contains  the  tower  index 
(register  type  None).  The  return  value  contains  the  address  of  the  output  register. 

ROUND 

2 

11 

Same  as  the  ROUND  1  operation,  but  returns  the  second  result  (the  ROUND  operation 
returns  to  results) 

EMULC 

12 

Perform  the  EMULC  operation  on  two  scratch  registers.  Arguments  1  and  2  contain  the 
two  input  scratch  register  addresses  for  the  operation,  and  the  return  value  contains  the 
address  of  the  output  register. 

END 

13 

Stop  executing  the  program 

NOP 

14 

No-operation.  Does  no  work. 

MOD 

15 

Perform  the  CRT  operation  on  a  scratch  register.  Arguments  1  contains  the  input  scratch 
register  addresses  for  the  operation,  and  the  return  value  contains  the  address  of  the 
output  register. 

4.4.9  Performance  results 

During  our  evaluations  of  FPGA  acceleration  of  the  KWS  application  we  noticed  that  RingAdd 
does  not  actually  benefit  from  acceleration,  simply  because  our  C  implementation  is  very 
efficient,  and  the  additional  overhead  of  data  transfer  to  and  from  the  FPGA  eliminates  any 
potential  benefit.  Our  accelerations  primarily  benefit  any  functions  using  CRT.  We  focus 
primarily  on  the  CRT  and  Bootstrap  (which  has  only  its  CRTs  accelerated). 

We  evaluated  the  performance  of  our  FPGA  co-processor  implementation  from  multiple 
perspectives.  In  particular,  we  evaluated  both  the  individual  perfonnance  improvements  of  the 
CRT  operation  running  in  isolation  on  the  FPGA  and  running  the  CRT  in  the  context  of  a 
broader  use-case  for  encrypted  KWS. 

The  CRT  text  search  operation  was  run  to  compare  1)  a  native  interpreted  Matlab  runtime  of  the 
CRT  running  on  the  CPU  of  the  FPGA  systems,  2)  a  mex  (i.e.  compiled  Matlab)  version  of  the 
CRT  running  on  the  PROCEED  Deathstar  environment  (64  cores)  and  3)  with  calls  the  FPGA 
co-processor  to  run  the  CRT  operation.  All  of  the  experiment  test  harnesses  were  run  in 
interpreted  Matlab  with  the  variable  being  how  the  CRT  was  executed.  We  did  not  include 
FPGA  setup  time  in  our  analysis.  We  did  include  the  time  to  submit  the  CRT  jobs  to  the  FPGA, 
and  the  response  times  in  our  experimental  analysis. 

Our  experimental  results  on  CRT  runtimes  for  the  Matlab  interpreted,  Matlab  CPU  compiled  and 
FPGA-supported  CRT  runtimes  for  various  ring  dimensions  can  be  seen  in  Figure  27.  We  found 
experimentally  that  the  value  of  the  ciphertext  moduli  has  little  impact  on  runtime,  as  long  as  the 
modulus  is  smaller  than  the  maximum  value  supported  by  the  CRT. 

In  this  figure  we  see  that  there  is  a  multiple  order  of  magnitude  performance  improvement  by 
offloading  CRT  computations  from  the  CPU  to  the  FPGA.  Note  that  the  curves  have  a  distinct 
“J”  shape  with  slightly  increased  runtimes  at  smaller  ring  dimensions  on  the  FPGA.  This  is  a 
result  of  the  FPGA  communication  times  and  is  a  common  artifact  of  FPGA  co-processor 
behavior.  An  additional  view  of  the  FPGA  speed-up  as  compared  to  the  Matlab  and  mex 
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runtimes  can  be  seen  in  Figure  28.  It  can  be  clearly  seen  that  the  FPGA  implementation  is  1600 
times  faster  than  our  fastest  single  CPU  version.  It  is  35  times  faster  than  the  Deathstar  version. 


Figure  27:  CRT  absolute  runtimes. 

We  attempted  to  use  the  FPGA  to  support  our  bootstrapping  operation,  but  we  ran  into 
challenges  due  to  parameter  selection  issues.  In  particular,  we  found  that  in  order  to  support  the 
necessary  depth  of  computation  and  larger  plaintext  moduli  needed  for  bootstrapping,  we  needed 
to  support  parameters  with  bit  widths  larger  than  64  bits  for  the  most  secure  ring  sizes.  This 
would  have  required  rewriting  all  our  FPGA  code,  which  was  not  possible  within  the  remaining 
project  scope.  However,  in  software  we  took  a  straightforward  approach  to  implementation  and 
attempted  an  FPGA  implementation  that  supported  solely  the  CRT  operations  on  the  FPGA,  with 
smaller  ring  sizes  that  could  still  be  supported  by  our  initial  <=64  bit  parameter  settings. 

Because  we  limited  ourselves  to  a  lower  ciphertext  modulus  than  necessary,  there  remains 
further  research  needed  to  fully  push  this  capability  into  operational  production.  This  “parameter 
engineering”  for  bootstrapping  will  be  early  low-hanging  fruit  for  any  future  development  effort. 

We  collected  initial  results  to  evaluate  bootstrapping  performance  with  only  the  CRT  accelerated 
on  the  FPGA.  The  speedup  can  be  seen  in  the  graph  in  Figure  29.  Figure  30  shows  the  speedup 
for  bootstrapping  as  a  function  of  ring  dimension.  We  see  a  less  dramatic  performance 
improvement  as  compared  to  solely  running  the  CRT  on  the  FPGA  as  in  earlier  experimental 
results  due  to  the  increased  amount  of  non-FPGA  computation  running  on  bootstrapping.  Many 
of  the  operations  we  are  still  running  on  the  CPU  could  also  be  out-sourced  to  an  FPGA  as 
additional  low-hanging  fruit  for  future  development  work. 
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-•-FPGA  Bootstrapping  without  Matlab  Overhead 
-•-Single-Threaded  x86-64  Compiled  Bootstrapping 


Figure  29:  Bootstrapping  runtime  performance  with  the  CRT  running  on  the  FPGA. 
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4.5  Encrypted  keyword  search  FHE  results  and  discussion 

We  initially  run  our  KWS  implementation  at  a  low  security  level  (8  =  1.08)  using  SHE 
computing  to  enable  the  email  system  to  be  interactive  with  fast  response  times.  Our  initial 
implementation  uses  a  ring  dimension  of  n  =  5 12  and  encrypts  emails  with  a  supported  depth  of 
computation  d  =  12.  This  will  result  in  an  effective  ciphertext  modulus  q  represented  with  430 
bits  and  provides  a  security  level  of  less  than  AES- 128  in  terms  of  work-factor. 

To  evaluate  the  relative  performance  speed-up  of  the  FPGA  co-processor  with  only  CRT 
acceleration  in  a  more  real-world  setting,  we  developed  both  compiled  and  FPGA  accelerated 
implementations  of  our  string  searching  algorithms. 

The  compiled  implementation  supported  all  string  search  operations  in  mex  Matlab,  a  C- 
compiled  version  of  Matlab.  The  FPGA  co-processor  implementation  ran  almost  all  of  the  string 
search  operations  in  the  slower  native  Matlab  interpreter,  except  for  the  CRT  operation  which 
were  run  on  the  FPGA  co-processor. 

To  set  a  baseline  of  performance,  we  ran  the  runtime  experiments  on  a  ring  dimension  of 
n=2048,  text  corpuses  of  length  1,4,  16,64  and  256  characters  long  (corresponding  to  the  length 
of  modern  text  messages)  and  keywords  of  length  1,2, 3, 4  and  5  characters  long.  We  supported 
all  of  ASCII  in  our  search  operations.  We  selected  n=2048  because  the  runtime  of  the  compiled 
string  search  at  higher  dimensions  was  too  long  to  collected  meaningful  data  over. 

Figure  3 1  and  Figure  32  show  the  runtime  of  the  string  search  operation  for  the  compiled  string 
search.  As  we  can  see,  the  runtime  grows  nearly  linearly  with  both  the  keyword  length  and 
corpus  length. 
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Figure  33  and  Figure  34  show  the  runtime  of  the  string  search  operation  for  the  FPGA  string 
search.  As  we  can  see,  the  runtime  grows  nearly  linearly  with  both  the  keyword  length  and 
corpus  length,  similar  to  with  the  compiled  version  of  the  operation,  but  substantially  faster. 

We  created  plots  to  show  the  relative  speed-up  of  using  the  FPGA  co-processor  in  Figure  35  and 
Figure  36.  We  see  that  the  speed-up  is  relatively  constant  at  5 Ox  (1.5  orders  of  magnitude) 
despite  variations  in  the  length  of  keywords  and  corpora.  We  found  that  this  5  OX  speed-up  was 
relatively  consistent  across  ring  dimensions  in  broader  experiments  at  lower  security  levels. 


Compiled  String  Search  Runtime  (s)  vs.  Length  of 
Corpus  n=2048 


Keyword  Length  1  — • — Keyword  Length  2  —•—Keyword  Length  3 
Keyword  Length  4— ♦—Keyword  Length  5 


Figure  31:  Runtime  of  the  compiled  string  search  operation  vs.  length  of  corpus 
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FPGA  String  Search  Runtime  (s)  vs. 
Length  of  Corpus  n=2048 

1000 


Keyword  Length  1  ©  Keyword  Length  2  — ©—Keyword  Length  3 
Keyword  Length  4  ©  Keyword  Length  5 


Figure  33:  Runtime  of  the  FPGA  accellerated  string  search  operation  vs.  length  of  keyword 


Our  intentions  are  to  continue  to  evaluate  the  performance  of  the  string  search  operations.  For 
example,  we  plan  on  generating  more  data  on  the  experiments  we  have  run  on  larger  keyword 
and  corpus  sets.  We  plan  on  generating  data  over  more  parameter  settings  to  assess  how  ring 
dimension  affects  the  FPGA  co-processor  in  comparison  to  the  compiled  operations.  For 
example,  we  will  assess  performance  up  to  the  n=16384  ring  dimension  and  up  to  10  character 
keywords  and  1000  character  text  corpora.  Early  indications  are  that  ring  dimension  has  a  lesser 
impact  on  the  runtime  of  FPGA-based  co-processor  and  we  expect  this  to  be  much  faster. 
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We  plan  on  generating  these  results  and  preparing  them  for  publication  in  an  IEEE  Transactions- 
type  journal  in  the  near  future. 


In  conclusion  we  have  taken  FHE  from  an  early  theoretical  possibility  to  an  actual 
implementation  on  multiple  platforms.  Our  three  pronged  approach  of  theory,  software  and 
hardware  implementation  has  had  many  benefits  we  did  not  see  at  the  outset.  By  working  closely 
with  newly  developed  FHE  schemes,  we  were  able  to  continuously  improve  the  capabilities  of 
our  FHE  scheme.  At  the  onset  we  did  not  have  a  bootstrapping  approach,  yet  eventually  this 
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capability  was  not  only  developed  but  proved  to  be  one  of  the  most  efficient  approaches 
available.  By  using  a  rapid  prototyping  approach  to  software  development,  we  were  able  to 
quickly  implement  working  prototypes  of  our  FHE  scheme,  and  develop  an  understanding  of 
where  the  strengths  and  weaknesses  were  in  our  implementations.  We  were  able  to  apply 
software  engineering  to  a  relatively  new  set  of  algorithms,  determining  appropriate  ways  to 
parallelize  our  operations  in  order  to  achieve  speedup  on  multicore  computers.  Our  approach 
also  allowed  us  to  use  state  of  art  code  generation  approaches  to  generate  embedded  C  code  for 
use  on  iPhone  and  iPod  systems  from  the  same  code  we  used  for  prototyping.  Targeting 
hardware  for  our  implementation  drove  our  development  in  some  fundamental  ways.  We  knew 
from  the  onset  that  we  needed  efficient  implementations  of  large  integer  modulus  arithmetic. 

This  helped  drive  our  double  CRT  implementation,  which  keeps  bit  widths  within  reasonable 
limits.  As  a  side  result  we  found  that  such  implementations  improved  the  performance  of  our 
software  as  well. 

Beyond  our  VoIP  functionality,  our  implementation  is  part  of  a  long-term  community  vision  to 
support  a  general,  practical  and  secure  computing  capability  through  a  layered  services 
architecture.  Part  of  our  vision  is  to  provide  software  interfaces  in  our  design  for  our  highly 
optimized  implementations  of  the  basic  FHE  operations  for  both  general  and  specific 
applications. 

Although  we  only  utilize  limited-depth  SHE  for  VOIP,  that  encryption  system  design  is  a  scaled- 
down  version  of  our  FHE  scheme  design.  Our  design  offers  the  possibility  for  a  much  more 
general  VoIP  teleconferencing  capability  that  incorporates  signal  detection  and  noise  filtering 
operations  on  the  encrypted  VoIP  channels.  This  more  general  design  would  enable  protection 
against  some  of  the  more  practical  attacks  that  could  be  made  by  an  adversary  such  as  noise 
injection  attacks  where  an  adversary  inserts  noise  into  a  VoIP  teleconferencing  session  to  reduce 
the  ability  of  participants  to  hear  one  another.  Using  more  general  FHE  capabilities,  we  could 
enable  the  untrusted  cloud  host  to  securely  filter  the  encrypted  VoIP  signals  before  or  after 
mixing  to  reduce  the  impacts  of  insertion  attacks. 

A  further  aspect  of  our  layered  architecture  vision  is  ability  to  mix-and-match  a  computing 
substrate  at  the  server  for  much  larger  scalability  and  throughput.  Our  FHE  design  has  been 
ported  to  an  FPGA,  but  with  further  refinement  could  be  amenable  to  GPUs  [14]  operating  as 
FHE  co-processors. 

6  RECOMMENDATIONS 

Recommendations  for  future  work  in  this  area  are  organized  into  three  areas.  The  first  area  for 
future  work  is  research  into  application  specific  uses  of  FHE.  One  noticeable  area  that  did  not  get 
sufficient  attention  during  our  program  was  in  the  areas  of  applications.  Initial  attempts  to 
identify  applications  of  FHE  had  usually  been  selected  by  the  Proceed  researchers  as  being 
potential  areas  for  both  FHE  and  SMP  computation.  However,  we  often  found  that  the  security 
model  for  the  application,  while  good  for  SMP  did  not  make  sense  for  the  postulated  cloud  based 
FHE  implementation.  It  is  apparent  to  us,  that  as  FHE  becomes  more  usable  through  research 
and  development  that  streamlines  the  implementation  and  executions  costs,  that  more 
applications  will  become  apparent.  We  also  found  from  our  research,  that  often  times,  a 
customized,  application  specific  data  representation  and  FHE  parameter  selection  was  more 
conducive  to  FHE  manipulation  (such  as  our  VOIP  application). 
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The  second  area  for  future  research  is  in  the  area  of  FHE  acceleration.  Our  approach  of  using 
Matlab  and  Simulink  to  rapidly  prototype  both  VHDL  and  C  code  implementations  of  key  FHE 
functions  was  vital  to  our  being  able  to  rapidly  prototype  and  develop  FHE  accelerators  of  key 
FHE  functions.  However,  we  find  now  that  as  FHE  technology  matures,  these  key  functions  have 
been  relatively  stable.  It  is  now  time  that  native  implementations  of  both  C/C++  and  VHDL  of 
these  functions  will  allow  developers  to  achieve  even  more  efficiencies  in  implementation. 
Further,  the  dramatic  improvements  in  GPU  technology  over  the  last  few  years  have  brought 
wider  bit  width  operations  closer  to  reality.  GPUs  have  a  deeper  penetration  into  the  cloud 
computing  commodity  market  than  FPGAs  currently  have,  and  as  such,  may  prove  a  valuable 
implementation  platform  for  acceleration  of  FHE.  As  with  many  complex  systems,  there  are 
numerous  improvements  that  could  be  made  to  the  FPGA  based  FHE  Accelerator: 

■  We  originally  chose  the  largest  tower  size  of  32  before  we  actually  had  a  viable  Bootstrap 
operation  derived.  We  have  since  determined  that  we  don’t  need  that  many  tower  entries  for 
FHE  at  high  security  levels.  Currently  we  are  running  the  internal  twiddle  table  block  RAM  at 
twice  the  frequency  of  the  rest  of  the  logic.  Through  a  combination  of  decreasing  the  number 
of  tower  vectors  down  from  32,  and  using  a  larger  size  FPGA  chip  with  more  block  RAM 
resources,  we  will  no  longer  need  to  run  the  block  RAM  at  twice  the  frequency  of  the  other 
logic,  and  we  will  be  able  to  increase  the  clock  frequency  of  the  other  logic. 

■  Currently,  the  ring  math  logic  is  all  internally  gated  by  a  clock  enable  signal.  This  allows  the 
data  to  be  fed  though  ring  math  logic  at  the  rate  at  which  it  arrives  to  the  FPGA  from  the  PCIe 
interface.  However,  by  updating  the  logic  that  drives  the  ring  math  logic  so  that  it 
appropriately  buffers  data  and  sends  it  though  the  ring  logic  without  any  breaks,  we  would 
negate  the  need  for  the  clock  enable  signals.  This  would  further  allow  us  to  increase  the  ring 
math  clock  frequency  since  the  large  fan  out  of  the  clock  enable  signal  complicates  placement 
and  routing. 

■  The  system  currently  runs  8-lane  x8  PCIe  Genl,  which  provides  approximately  2  Gb/sec  per 
lane  for  a  total  16  Gb/sec  throughput.  If  we  move  to  PCIe  Gen2,  the  throughput  would  double 
to  4  Gb/sec  per  lane.  The  current  hardware  supports  Gen2  PCIe,  however  the  PCIe  IP  block 
we  used  did  not.  However,  it  should  be  possible  to  move  to  Gen2  PCIe  by  either  upgrading  the 
FPGA  tools,  or  by  developing  our  own  PCIe  interface  logic. 

The  third  area  for  future  work  is  in  the  development  of  a  portable,  extensible  library  of  FHE 
operations  that  are  easily  adapted  by  users  unfamiliar  with  FHE  details.  Such  a  library  should  be 
developed  using  state  of  the  art  programming  practices,  and  be  able  to  support  a  wide  range  of 
platforms:  single  and  multi-core  systems,  hardware  accelerators  such  as  FPGAs  and  GPUs,  and 
distributed  cloud-type  enterprise  systems.  This  would  enable  developers  to  develop  on  one 
platform  then  scale  their  work  easily  to  large  compute  environments. 
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A.1  Appendix  SIPHER  SHE  runtime  data  tables 

Table  14:  Encryption  runtime  (mSec)  vs.  depth  of  computation  supported  and  Ring  Dimension  for 

p  =  2 


Depth 

Dim. 

1 

3 

5 

7 

8 

11 

13 

15 

17 

19 

512 

2.32 

2.83 

2.86 

3.27 

3.39 

3.25 

4.38 

4.64 

5.35 

5.66 

1024 

3.87 

5.33 

5.17 

5.98 

5.68 

5.63 

6.94 

8.4 

9.04 

9.2 

2048 

6.26 

6.48 

7.01 

7.47 

7.94 

8.78 

12.7 

13.03 

13.05 

14.52 

4096 

12.08 

12.27 

13.04 

14.87 

17.38 

17.65 

20.73 

17.46 

21.57 

22.13 

8192 

24.53 

25.18 

26.13 

29.07 

30.81 

32.15 

34.43 

32.46 

36.16 

37.9 

16384 

52.3 

55.02 

58.05 

59.71 

60.29 

61.98 

63.44 

64.99 

69.96 

72.89 

Table  15:  EvalAdd  runtime  (mSec)  vs.  depth  of  computation  supported  and  Ring  Dimension  for  p 

=  2 


Dim. 

Depth 

1 

3  5  7  9  11 

13 

15 

17 

19 

512  . 

0.21 

0.32  0.42  0.54  0.64  0.73 

1.26 

2.11 

2.90 

3.12 

1024 

0.3 

1.04  0.47  0.57  0.72  0.74 

1.4 

2.72 

2.85 

2.93 

2048 

0.37 

0.45  0.55  0.67  0.8  1 

1.97 

3 

3.04 

3.24 

4096 

0.56 

0.65  0.74  0.91  1.92  2.07 

2.25 

2.43 

3.73 

3.54 

8192 

0.89 

1.01  1.2  1.36  2.46  2.7 

3.69 

3.23 

5.05 

5.44 

16384 

1.58 

1.82  2.12  2.39  3.99  4.19 

4.27 

4.77 

7.16 

7.29 

Table  16:  ComposedEvalMult  runtime  (mSec)  vs.  depth  of  computation  and  Ring  Dimension  for  p 

=  2 

Depth 

1 

3 

5  7  9  11 

13 

15 

17 

19 

Dim.. 

512 

16.03 

22.73 

23.32  22.65  22.87  22.96 

24.35 

25.24 

25.37 

25.78 

1024 

29.15 

37.85 

39.05  39.11  38.79  39.24 

39.49 

39.59 

39.52 

39.68 

2048 

49.17 

66.31 

66.77  67.41  67.15  68.38 

68.22 

69.27 

69.45 

71.09 

4096 

99.56 

140.42 

140.71  141.42  141.26  142.75 

143.52 

145.51 

144.61 

148.31 

8192 

196.83 

279.37 

280.42  284.4  283.98  285.69 

289.59 

286.55 

292.69 

295.69 

16384 

463.92 

623.19 

622.74  628.87  630.43  633.37 

639.52 

642.8 

651.2 

659.88 

Table  17:  Decryption  runtime  (mSec)  vs.  depth  of  computation  supported  and  initial  Ring 

Dimension  for  p  =  2 

Dim 

Depth 

1 

3  5  7  9  11 

13 

15 

17 

19 
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512 

0.40 

0.26 

0.13 

0.14 

0.10 

0.10 

0.06 

0.06 

0.06 

0.06 

1024 

0.87 

0.38 

0.18 

0.11 

0.11 

0.11 

0.11 

0.11 

0.05 

0.05 

2048 

1.92 

0.84 

0.38 

0.38 

0.22 

0.22 

0.22 

0.22 

0.12 

0.12 

4096 

3.36 

1.7 

0.84 

0.86 

0.37 

0.39 

0.38 

0.22 

0.22 

0.21 

8192 

7.22 

3.43 

1.67 

1.72 

0.85 

0.87 

0.86 

0.87 

0.39 

0.4 

16384 

15.36 

7.18 

3.37 

3.37 

1.67 

1.67 

1.67 

1.73 

0.87 

0.85 
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Following  this  page  are  full  reprints  of  all  SIPHER  Publications  listed  is  Section  8 
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SIPHER:  Scalable  Implementation  of  Primitives  for  Homomorphic 
EncRyption  -  FPGA  implementation  using  Simulink 

David  Bruce  Cousins,  Kurt  Rohloff,  Chris  Peikert,  Rick  Schantz 
Raytheon  BBN  Technologies,  Georgia  Institute  of  Technology 
{dcousins,  krohloff,  schantz} @bbn.com  cpeikert@cc.gatech.edu 


Abstract 

Practical  Fully  Homomorphic  Encryption  (FHE)  would  be  a 
game-changing  technology  to  enable  secure,  general 
computation  on  encrypted  data,  e.g.,  on  untrusted  off-site 
hardware.  Recent  theoretical  breakthroughs  demonstrated 
the  existence  of  Fully  Homomorphic  Encryption  schemes 
[1,4].  However,  FHE  remains  impractical  because  current 
implementations  are  many  orders  of  magnitude  too  slow  for 
practical  use,  and  do  not  scale  well  to  the  very  large  keys 
and  ciphertexts  needed  to  assure  a  sufficient  level  of 
security.  A  new  DARPA  program  (PROCEED)  has  as  its 
focus  the  acceleration  of  various  aspects  of  the  FHE 
concept  toward  practical  implementation  and  use. 

In  this  paper  we  present  early  work  on  our  SIPHER  project, 
an  element  of  the  PROCEED  program,  whose  goal  is  to 
demonstrate  FHE  implementations  that  improve  the  state  of 
the  art  by  many  orders  of  magnitude.  As  part  of  our  activity 
we  are  developing  a  set  of  hardware  primitives  to  accelerate 
FHE  implementations  based  on  lattice  problems  [3],  As  an 
important  aspect  of  our  design  methodology  we  use  a  state 
of  the  art  tool-chain  offered  by  the  Mathworks  to  develop 
FPGA  circuits  from  Simulink  Models.  We  initially  develop 
prototype  descriptions  in  Matlab  that  we  re-implement  in  a 
stream  oriented,  hardware  implementable  manner  in 
Simulink.  The  operations  of  the  implementations  are 
compared  to  verily  correctness.  A  conversion  from 
Simulink  to  VHDL  is  done  in  a  completely  automated 
fashion  using  Math  work’s  HDL  coder.  This  tool  chain 
provides  us  the  means  to  develop  our  primitives,  including 
cyclic  VHDL  based  FPGA  prototyping,  much  faster  than 
traditional  methods. 

Fully  and  Somewhat  Homomorphic 
Encryption 

Fully  Homomorphic  Encryption  (FHE)  holds  the  promise  to 
securely  run  arbitrary  computations  over  encrypted  data  on 
untrusted  computation  hosts  [4].  The  general  FHE  concept 
of  operations  is  that  sensitive  data  is  encrypted  with  a 
public  key,  then  sent  to  an  untrusted  computation  host, 
which  can  perform  arbitrary  computations  on  the  encrypted 
data  without  first  needing  to  decrypt  it.  It  has  been  shown 
to  be  theoretically  possible  to  evaluate  arbitrary  programs 


*  Sponsored  by  the  Defense  Advanced  Research  Projects  Agency 
(DARPA)  and  the  Air  Force  Research  Laboratory  (AFRL)  under  Contract 
No.  FA8750-1  l-C-0098.  The  views  expressed  are  those  of  the  authors  and 
do  not  reflect  the  official  policy  or  position  of  the  Department  of  Defense 
or  the  U.S.  Government.  Distribution  Statement  “A”  (Approved  for  Public 
Release,  Distribution  Unlimited). 


using  just  two  special  purpose  FHE  operations,  EvalAdd 
and  EvalMult,  which  roughly  correspond  to  bitwise  XOR 
and  AND  gates  operating  on  encrypted  bits.  A  sequence  of 
these  operations  is  run  against  the  encrypted  data,  resulting 
in  an  encryption  of  the  output  of  the  original  program  run 
on  the  unencrypted  data.  This  encrypted  result  can  then  be 
sent  back  to  the  original  client,  who  decrypts  the  result 
using  its  secret  key.  The  encrypted  data  is  protected  at  all 
times  with  reasonable  security  guarantees  based  on 
computational  hardness  results. 

FHE  enables  more  secure  and  private  computation,  but  to 
be  effective  there  needs  to  be  multiple  orders  of  magnitude 
efficiency  improvements  before  it  can  be  practical.  Known 
FHE  schemes  are  highly  inefficient  partly  because  they  are 
“noisy”  -  the  encryption  schemes’  ciphertext  is  a  function 
not  only  of  the  plaintext  and  encryption  keys  but  also  of  a 
noise  term.  The  amount  of  noise  in  a  ciphertext  rapidly 
increases  as  the  EvalAdd  and  EvalMult  operations  are 
performed,  and  after  too  many  such  operations  there  is  too 
much  noise  to  correctly  decrypt  the  ciphertext.  To  run 
larger  numbers  of  EvalAdd  and  EvalMult  operations,  FHE 
schemes  typically  address  the  accumulation  of  noise  with  a 
very  computationally  expensive  “recryption”  operation  that 
is  periodically  run  on  intermediate  ciphertexts  to  keep  the 
noise  at  a  level  that  still  permits  decryption. 

A  Somewhat  Homomorphic  Encryption  (SHE)  scheme 
supports  several  (but  not  unlimited)  EvalMult  and  EvalAdd 
operations  while  preserving  the  correctness  of  decryption. 
In  other  words,  SHE  can  schemes  support  secure 
computation  for  only  a  small  subset  of  programs.  Our 
development  approach  is  to  select  an  efficient 
implementation  of  an  SHE  scheme  which  can  be  converted 
into  a  full  FHE  scheme  with  the  addition  of  a  recryption 
(noise  reduction)  operation  and/or  other  supporting 
modifications.  This  enables  us  to  incrementally  develop 
SHE  results  using  modest  initial  resources. 

Although  there  have  been  some  initial  FHE 
implementations  [1],  there  have  been  no  practical 
implementations  that  can  be  used  for  effective  general 
computation.  Current  designs  of  FHE  schemes  rely  on 
operations  (i.e.  modular  arithmetic  with  an  enormous 
modulus)  that  are  inefficient  on  standard  CPU  architectures 
and  which  are  too  memory  intensive.  For  convenience  all  of 
these  previous  implementations  have  been  limited  by  their 
focus  on  CPUs  and  do  not  take  advantage  of  specialized 
parallel  computation  hardware  like  FPGAs. 
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External  users  of  SIPHER  FHE 
Implementation 


_L 

Thin  software  layer  manages 
high-level  use  of  SIPHER 


FHE  Oper.  (Encrypt,  etc....) 

—j - 


Software  layer  manages  use  of  lattice- 
based  primitives  for  FHE  operations 


Lattice-Based  Primitives 


Thin  software  layer  manages  FPGA 
circuits  for  lattice-based  primitives 


SIPHER  on  FPGA  Circuits 


Figure  1:  Conceptual  diagram  of  system. 

Figure  1  shows  our  vision  for  the  layered  services  we 
provide  in  our  FHE  implementation.  There  are  software 
interfaces  for  implementations  of  the  basic  FHE  operations 
(KeyGen,  Encrypt,  EvalAdd,  EvalMult,  Recrypt,  Decrypt) 
as  a  primitive  basis  for  constructing  more  general 
applications  on  encrypted  data.  Our  approach  to  the  FHE 
primitives  is  based  on  the  highly  efficient  lattice-based 
techniques  developed  by  one  of  our  investigators  [3],  which 
can  be  implemented  with  only  a  handful  of  core 
mathematical  primitive  operations  (see  Figure  2).  Many  of 
these  operations  are  closely  related  to  well-understood 
operations,  such  as  Fast  Fourier  Transforms,  which  we  are 
targeting  for  efficient  implementations  on  FPGAs.  The 
EvalAdd  and  EvalMult  operations  for  example  are  simply 
element  wise  vector  adds  and  multiplies  taken  modulo  some 
particular  prime  integer  q.  These  are  trivial  to  express  using 
Matlab:  c  =  mod(a+b,  q)  and  c  =  mod(a.*b,  q). 


Encrypt: 
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pk  (ideal) 

Lf 


Transform  (CRT) 

T 

ciphertext  c 


Recrypt: 

Random 

new  public  Ring  Sample 
key  pk'  (ideal)  ^ 


■— c 
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"two-layer" 
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Addition  in 
the  Ring 
C1  c2 

l»j  + 
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the  Ring 
ci  c2 
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— ; — 
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Y 
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Random  Ring  Samples 

I 


Complex  Ring 
Addition  and 
Multiplication  Circuit 


Public  Keys  Secret  Key  sk 
(pk,  pk’,...) 


Figure  2:  Primitives  for  a  SHE  scheme. 


Legend: 

Figure  3:  Internal  Structure  of  CRT  Primitive  showing  similarity 
to  signal  processing  data  flow. 


We  are  leveraging  previous  work  on  signal  processing 
implementations  to  implement  the  primitives  (and 
consequently  the  FHE  scheme)  as  circuits  on  FPGAs.  The 
FPGAs  provide  highly  cost-effective  parallelism. 

One  of  our  primary  primitive  operations  is  the  Chinese 
Remainder  Transform  (CRT).  The  CRT  is  mathematically 
similar  to  the  Discrete  Fourier  Transform,  but  implemented 
using  modular  integer  (instead  of  complex)  arithmetic. 
Figure  3  shows  the  CRT  implementation  we  are  working 
with  that  is  structurally  very  similar  to  the  familiar 
processing  of  multi-dimensional  signal  data.  The 
similarities  of  our  primitives  with  well  understood  signal¬ 
processing  operations  that  have  been  efficiently 
implemented  in  FPGAs  give  us  confidence  toward 
developing  efficient  and  scalable  FPGA  implementations  of 
the  primitives. 

The  FFT  operation  in  Figure  4  is  similar  to  the  standard 
FFT  [2],  except  all  operations  are  done  in  modulo  q 
arithmetic.  We  were  able  to  take  Mathwork’s  example 
Simulink  streaming  FFT  model  (Figure  4),  slightly  modify 
the  ordering  of  the  output,  and  easily  change  from  complex 
to  integer  arithmetic  simply  by  altering  the  input  and 
twiddle  factor  data  types.  Converting  to  modular  arithmetic 
is  also  straightforward. 


Figure  4:  Simulink  model  for  streaming  FFT. 
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To  implement  the  modular  arithmetic  efficiently  in 
hardware  we  have  taken  advantage  of  the  Montgomery 
Reduction  method  [5],  which  allows  one  to  express  mod  q 
operations  in  a  larger  basis  r,  which  can  be  a  power  of  two. 
So  while  the  bits  required  to  represent  the  integers  have 
grown,  all  arithmetic  operations  now  are  allowed  to  wrap 
around  on  overflow,  eliminating  the  need  to  do  a  costly 
modular  reduction  operation  in  the  hardware.  We 
implement  the  Montgomery  reduction  method  in  Simulink 
using  the  fixed  point  tool  box  (Figure  5).  The  additional 
complexity  this  adds  to  the  Simulink  model  for  our  ring 
operations  is  trivial.  Figure  6  shows  the  Montgomery 
reduction  steps  in  Red  and  Blue.  Red  steps  convert  to  the 
Montgomery  space,  and  are  done  once  for  each  input.  Any 
number  of  additions  can  be  done  without  the  need  for  a 
reduction  step.  Each  multiply  requires  one  reduction  step.  A 
final  reduction  step  converts  back  into  the  original  mod  q 
representation.  Our  modified  FFT  implementation  requires 
pre-computation  of  the  twiddle  factors  in  Montgomery 
representation  (no  real  time  impact),  one  reduction  step  for 
each  input  sample,  one  reduction  for  each  output  sample 
and  one  reduction  at  the  output  of  each  butterfly 
multiplication.  Since  all  reduction  is  done  using  a  pipelined 
approach,  there  is  no  additional  computation  time  added 
(just  latency). 

Interim  Results 

Our  presentation  will  include  examples  of  our  primitives 
coded  in  Matlab  and  Simulink  and  examples  of  VHDL  code 
generated  by  the  HDL  coder.  We  will  also  be  able  to  show 
timing  results  from  Modelsim  based  simulations  of  the 
resulting  code. 
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Figure  5:  Simulink  model  for  Montgomery  Reduction 


Figure  6:  Simulink  model  for  Ring  Multiply-Add 
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Trapdoors  for  Lattices: 
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Abstract 

We  give  new  methods  for  generating  and  using  “strong  trapdoors”  in  cryptographic  lattices,  which 
are  simultaneously  simple,  efficient,  easy  to  implement  (even  in  parallel),  and  asymptotically  optimal 
with  very  small  hidden  constants.  Our  methods  involve  a  new  kind  of  trapdoor,  and  include  specialized 
algorithms  for  inverting  LWE,  randomly  sampling  SIS  preimages,  and  securely  delegating  trapdoors. 
These  tasks  were  previously  the  main  bottleneck  for  a  wide  range  of  cryptographic  schemes,  and  our 
techniques  substantially  improve  upon  the  prior  ones,  both  in  terms  of  practical  performance  and  quality 
of  the  produced  outputs.  Moreover,  the  simple  structure  of  the  new  trapdoor  and  associated  algorithms  can 
be  exposed  in  applications,  leading  to  further  simplifications  and  efficiency  improvements.  We  exemplify 
the  applicability  of  our  methods  with  new  digital  signature  schemes  and  CCA-secure  encryption  schemes, 
which  have  better  efficiency  and  security  than  the  previously  known  lattice-based  constructions. 


1  Introduction 

Cryptography  based  on  lattices  has  several  attractive  and  distinguishing  features: 

•  On  the  security  front,  the  best  attacks  on  the  underlying  problems  require  exponential  2-1(n>  time  in 
the  main  security  parameter  n,  even  for  quantum  adversaries.  By  constrast,  for  example,  mainstream 
factoring-based  cryptography  can  be  broken  in  subexponential  2°(nl  3)  time  classically,  and  even  in 
polynomial  n° ^  time  using  quantum  algorithms.  Moreover,  lattice  cryptography  is  supported  by 
strong  worst-case/average-case  security  reductions,  which  provide  solid  theoretical  evidence  that  the 
random  instances  used  in  cryptography  are  indeed  asymptotically  hard,  and  do  not  suffer  from  any 
unforeseen  “structural”  weaknesses. 
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•  On  the  efficiency  and  implementation  fronts,  lattice  cryptography  operations  can  be  extremely  simple, 
fast  and  parallelizable.  Typical  operations  are  the  selection  of  uniformly  random  integer  matrices  A 
modulo  some  small  q  =  poly(n),  and  the  evaluation  of  simple  linear  functions  like 

/a(x)  :=  Ax  mod  q  and  gx (s,  e)  :=  s1  A  +  e *  mod  q 

on  short  integer  vectors  x,  e[j](For  commonly  used  parameters,  fx  is  surjective  while  gx  is  injective.) 
Often,  the  modulus  q  is  small  enough  that  all  the  basic  operations  can  be  directly  implemented  using 
machine-level  arithmetic.  By  contrast,  the  analogous  operations  in  number-theoretic  cryptography  (e.g., 
generating  huge  random  primes,  and  exponentiating  modulo  such  primes)  are  much  more  complex, 
admit  only  limited  parallelism  in  practice,  and  require  the  use  of  “big  number”  arithmetic  libraries. 


In  recent  years  lattice-based  cryptography  has  also  been  shown  to  be  extremely  versatile,  leading  to  a  large 
number  of  theoretical  applications  ranging  from  (hierarchical)  identity -based  encryption  [GPV08.  CHKP101 
I  ABB  1  Oal  I  ABB  1  Obi .  to  fully  homomorphic  encryption  schemes  HGen09bllGen09allvGHV10llBVl  lbllBVl  lal 


IGH111 IBGV111.  and  much  more  (e.g.,  HLM081 IPW081  |Lyu08[  fPV08l  IPVW081  IPei09bl  IACPS091  IRuclOl 
Boyl0|lGHVT0llGKV10in. 


Not  all  lattice  cryptography  is  as  simple  as  selecting  random  matrices  A  and  evaluating  linear  functions 
like  /a(x)  =  Ax  mod  q,  however.  In  fact,  such  operations  yield  only  collision-resistant  hash  functions, 
public -key  encryption  schemes  that  are  secure  under  passive  attacks,  and  little  else.  Richer  and  more  advanced 
lattice-based  cryptographic  schemes,  including  chosen  ciphertext-secure  encryption,  “hash-and-sign”  digital 
signatures,  and  identity -based  encryption  also  require  generating  a  matrix  A  together  with  some  “strong” 
trapdoor,  typically  in  the  form  of  a  nonsingular  square  matrix  (a  basis)  S  of  short  integer  vectors  such  that 
AS  =  0  mod  q.  (The  matrix  S  is  usually  interpreted  as  a  basis  of  a  lattice  defined  by  using  A  as  a  “parity 
check”  matrix.)  Applications  of  such  strong  trapdoors  also  require  certain  efficient  inversion  algorithms  for  the 
functions  /a  and  gx,  using  S.  Appropriately  inverting  /a  can  be  particularly  complex,  as  it  typically  requires 
sampling  random  preimages  of  /a(x)  according  to  a  Gaussian-like  probability  distribution  (see  IGPV08;1). 

Theoretical  solutions  for  all  the  above  tasks  (generating  A  with  strong  trapdoor  S  [|Aj  t99[[A  P09I.  trapdoor 
inversion  of  gx  and  preimage  sampling  for  /a  [GPV08D  are  known,  but  they  are  rather  complex  and  not  very 
suitable  for  practice,  in  either  runtime  or  the  “quality”  of  their  outputs.  (The  quality  of  a  trapdoor  S  roughly 
corresponds  to  the  Euclidean  lengths  of  its  vectors  —  shorter  is  better.)  The  current  best  method  for  trapdoor 
generation  HAP09II  is  conceptually  and  algorithmically  complex,  and  involves  costly  computations  of  Hermite 
normal  forms  and  matrix  inverses.  And  while  the  dimensions  and  quality  of  its  output  are  asymptotically 
optimal  (or  nearly  so,  depending  on  the  precise  notion  of  quality),  the  hidden  constant  factors  are  rather  large. 
Similarly,  the  standard  methods  for  inverting  gx  and  sampling  preimages  of  /a  HBab851  [KleOOl  1GPV081 
are  inherently  sequential  and  time-consuming,  as  they  are  based  on  an  orthogonalization  process  that  uses 
high-precision  real  numbers.  A  more  efficient  and  parallelizable  method  for  preimage  sampling  (which 
uses  only  small-integer  arithmetic)  has  recently  been  discovered  UPeilOll.  but  it  is  still  more  complex  than  is 
desirable  for  practice,  and  the  quality  of  its  output  can  be  slightly  worse  than  that  of  the  sequential  algorithm 
when  using  the  same  trapdoor  S. 

More  compact  and  efficient  trapdoors  appear  necessary  for  bringing  advanced  lattice-based  schemes 
to  practice,  not  only  because  of  the  current  unsatisfactory  runtimes,  but  also  because  the  concrete  security 
of  lattice  cryptography  can  be  quite  sensitive  to  even  small  changes  in  the  main  parameters.  As  already 


1  Inverting  these  functions  corresponds  to  solving  the  “short  integer  solution”  (SIS)  problem  (Ajt96|  for  /a,  and  the  “learning 
with  errors”  (LWE)  problem  [|RegQ5|  for  L,  both  of  which  are  widely  used  in  lattice  cryptography  and  enjoy  provable  worst-case 
hardness. 
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mentioned,  two  central  objects  are  a  uniformly  random  matrix  A  6  Z”xm  that  serves  as  a  public  key,  and  an 
associated  secret  matrix  S  e  Z"1  x  m  consisting  of  short  integer  vectors  having  “quality”  s,  where  smaller 
is  better.  Here  n  is  the  main  security  parameter  governing  the  hardness  of  breaking  the  functions,  and  m  is 
the  dimension  of  a  lattice  associated  with  A,  which  is  generated  by  the  vectors  in  S.  Note  that  the  security 
parameter  n  and  lattice  dimension  m  need  not  be  the  same;  indeed,  typically  we  have  m  =  0(n  lg  q),  which 
for  many  applications  is  optimal  up  to  constant  factors.  (For  simplicity,  throughout  this  introduction  we 
use  the  base-2  logarithm;  other  choices  are  possible  and  yield  tradeoffs  among  the  parameters.)  For  the 
trapdoor  quality,  achieving  s  =  0 ( y/m)  is  asymptotically  optimal,  and  random  preimages  of  /a  generated 
using  S  have  Euclidean  length  (3  ~  Sy/m.  For  security,  it  must  be  hard  ( without  knowing  the  trapdoor)  to  find 
any  preimage  having  length  bounded  by  (3.  Interestingly,  the  computational  resources  needed  to  do  so  can 
increase  dramatically  with  only  a  moderate  decrease  in  the  bound  (3  (see,  e.g.,  HGN081lMR09l).  Therefore, 
improving  the  parameters  m  and  s  by  even  small  constant  factors  can  have  a  significant  impact  on  concrete 
security.  Moreover,  this  can  lead  to  a  “virtuous  cycle”  in  which  the  increased  security  allows  for  the  use 
of  a  smaller  security  parameter  n,  which  leads  to  even  smaller  values  of  rn,  s,  and  (3,  etc.  Note  also  that 
the  schemes’  key  sizes  and  concrete  runtimes  are  reduced  as  well,  so  improving  the  parameters  yields  a 
“win-win-win”  scenario  of  simultaneously  smaller  keys,  increased  concrete  security,  and  faster  operations. 
(This  phenomenon  is  borne  out  concretely;  see  Figure [2j) 


1.1  Contributions 


The  first  main  contribution  of  this  paper  is  a  new  method  of  trapdoor  generation  for  cryptographic  lattices, 
which  is  simultaneously  simple,  efficient,  easy  to  implement  (even  in  parallel),  and  asymptotically  optimal 
with  small  hidden  constants.  The  new  trapdoor  generator  strictly  subsumes  the  prior  ones  of  [Ajt99[|AP091, 
in  that  it  proves  the  main  theorems  from  those  works,  but  with  improved  concrete  bounds  for  all  the 
relevant  quantities  (simultaneously),  and  via  a  conceptually  simpler  and  more  efficient  algorithm.  To 
accompany  our  trapdoor  generator,  we  also  give  specialized  algorithms  for  trapdoor  inversion  (for  (jjy)  and 
preimage  sampling  (for  /a),  which  are  simpler  and  more  efficient  in  our  setting  than  the  prior  general 
solutions  11 B  ab8  5  L  iKTeOQl  IGPVQ81  fPeTTOll . 

Our  methods  yield  large  constant-factor  improvements,  and  in  some  cases  even  small  asymptotic  im¬ 
provements,  in  the  lattice  dimension  rn,  trapdoor  quality  s,  and  storage  size  of  the  trapdoor.  Because  trapdoor 
generation  and  inversion  algorithms  are  the  main  operations  in  many  lattice  cryptography  schemes,  our 
algorithms  can  be  plugged  in  as  ‘black  boxes’  to  deliver  significant  concrete  improvements  in  all  such  applica¬ 
tions.  Moreover,  it  is  often  possible  to  expose  the  special  (and  very  simple)  structure  of  our  trapdoor  directly 
in  cryptographic  schemes,  yielding  additional  improvements  and  potentially  new  applications.  (Below  we 
summarize  a  few  improvements  to  existing  applications,  with  full  details  in  Section[6|) 

We  now  give  a  detailed  comparison  of  our  results  with  the  most  relevant  prior  works  [Ajt99  1AP09, 
GPVOSi  IPei  101.  The  quantitative  improvements  are  summarized  in  Figure  [T] 


Simpler,  faster  trapdoor  generation  and  inversion  algorithms.  Our  trapdoor  generator  is  exceedingly 
simple,  especially  as  compared  with  the  prior  constructions  [  Ajt99|  1AP09I .  It  essentially  amounts  to  just  one 
multiplication  of  two  random  matrices,  whose  entries  are  chosen  independently  from  appropriate  probability 
distributions.  Surprisingly,  this  method  is  nearly  identical  to  Ajtai’s  original  method  |  Ajt96|  of  generating  a 
random  lattice  together  with  a  “weak”  trapdoor  of  one  or  more  short  vectors  (but  not  a  full  basis),  with  one 
added  twist.  And  while  there  are  no  detailed  runtime  analyses  or  public  implementations  of  ]Ajt99[lAP091. 
it  is  clear  from  inspection  that  our  new  method  is  significantly  more  efficient,  since  it  does  not  involve  any 
expensive  Hermite  normal  form  or  matrix  inversion  computations. 
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Our  specialized,  parallel  inversion  algorithms  for  /a  and  <ja  are  also  simpler  and  more  practically 
efficient  than  the  general  solutions  of  llBab85l  iKleOOl  1GPV081  iPeilOll  (though  we  note  that  our  trapdoor 
generator  is  entirely  compatible  with  those  general  algorithms  as  well).  In  particular,  we  give  the  first  parallel 
algorithm  for  inverting  gA  under  asymptotically  optimal  error  rates  (previously,  handling  such  large  errors 
required  the  sequential  “nearest -plane”  algorithm  of  [  Bab851 ).  and  our  preimage  sampling  algorithm  for  /a 
works  with  smaller  integers  and  requires  much  less  offline  storage  than  the  one  from  IPeilOll. 

Tighter  parameters.  To  generate  a  matrix  A  E  Z”Xm  that  is  within  negligible  statistical  distance  of 
uniform,  our  new  trapdoor  construction  improves  the  lattice  dimension  from  m  >  5n  lg  q  HAP091  down  to 
m  «  2n  lg  q.  (In  both  cases,  the  base  of  the  logarithm  is  a  tunable  parameter  that  appears  as  a  multiplicative 
factor  in  the  quality  of  the  trapdoor;  here  we  fix  upon  base  2  for  concreteness.)  In  addition,  we  give  the  first 
known  computationally  pseudorandom  construction  (under  the  LWE  assumption),  where  the  dimension  can 
be  as  small  as  m  =  n(l  +  lg  q),  although  at  the  cost  of  an  Offn)  factor  worse  quality  s. 

Our  construction  also  greatly  improves  the  quality  s  of  the  trapdoor.  The  best  prior  construction  BAP091 
produces  a  basis  whose  Gram-Schmidt  quality  (i.e.,  the  maximum  length  of  its  Gram-Schmidt  orthogonalized 
vectors)  was  loosely  bounded  by  20 \Jn  lg  q.  However,  the  Gram-Schmidt  notion  of  quality  is  useful  only 
for  less  efficient,  sequential  inversion  algorithms  IIBab85,  GPV08 1  that  use  high-precision  real  arithmetic. 
For  the  more  efficient,  parallel  preimage  sampling  algorithm  of  UPeilOil  that  uses  small-integer  arithmetic, 
the  parameters  guaranteed  by  IIAP09I  are  asymptotically  worse,  at  m  >  n  lg2  q  and  s  >  16 \J  n  lg2  q.  By 
contrast,  our  (statistically  secure)  trapdoor  construction  achieves  the  “best  of  both  worlds:”  asymptotically 
optimal  dimension  m  «  2n\gq  and  quality  s  ~  1.6 y/n Igq  or  better,  with  a  parallel  preimage  sampling 
algorithm  that  is  slightly  more  efficient  than  the  one  of  IPeilOll. 

Altogether,  for  any  n  and  typical  values  of  q  >  216,  we  conservatively  estimate  that  the  new  trapdoor 
generator  and  inversion  algorithms  collectively  provide  at  least  a  71gq  >  112 -fold  improvement  in  the 
length  bound  (3  ~  Sy/m  for  /a  preimages  (generated  using  an  efficient  algorithm).  We  also  obtain  similar 
improvements  in  the  size  of  the  error  terms  that  can  be  handled  when  efficiently  inverting  ()a- 

New,  smaller  trapdoors.  As  an  additional  benefit,  our  construction  actually  produces  a  new  kind  of 
trapdoor  —  not  a  basis  —  that  is  at  least  4  times  smaller  in  storage  than  a  basis  of  corresponding  quality, 
and  is  at  least  as  powerful,  i.e.,  a  good  basis  can  be  efficiently  derived  from  the  new  trapdoor.  We  stress  that 
our  specialized  inversion  algorithms  using  the  new  trapdoor  provide  almost  exactly  the  same  quality  as  the 
inefficient,  sequential  algorithms  using  a  derived  basis,  so  there  is  no  trade-off  between  efficiency  and  quality. 
(This  is  in  contrast  with  II Pei  1 01  when  using  a  basis  generated  according  to  IIAP091.)  Moreover,  the  storage 
size  of  the  new  trapdoor  grows  only  linearly  in  the  lattice  dimension  m,  rather  than  quadratically  as  a  basis 
does.  This  is  most  significant  for  applications  like  hierarchical  ID-based  encryption  liCHKPlOl  lABB  lOal 
that  delegate  trapdoors  for  increasing  values  of  m.  The  new  trapdoor  also  admits  a  very  simple  and  efficient 
delegation  mechanism,  which  unlike  the  prior  method  BCHKPIOI  does  not  require  any  costly  operations  like 
linear  independence  tests,  or  conversions  from  a  full-rank  set  of  lattice  vectors  into  a  basis.  In  summary, 
the  new  type  of  trapdoor  and  its  associated  algorithms  are  strictly  preferable  to  a  short  basis  in  terms  of 
algorithmic  efficiency,  output  quality,  and  storage  size  (simultaneously). 

Ring-based  constructions.  Finally,  and  most  importantly  for  practice,  all  of  the  above-described  construc¬ 
tions  and  algorithms  extend  immediately  to  the  ring  setting,  where  functions  analogous  to  /a  and  <?a  require 
only  quasi-linear  0[n)  space  and  time  to  specify  and  evaluate  (respectively),  which  is  a  factor  of  Q(n) 
improvement  over  the  matrix-based  functions  defined  above.  See  the  representative  works  llMic02l  1PR06I 
IFM06I ILMPR081  iLPRIOll  for  more  details  on  these  functions  and  their  security  foundations. 

4 


Approved  for  Public  Release;  Distribution  Unlimited. 
75 


Figure  1:  Summary  of  parameters  for  our  constructions  and  algorithms  versus  prior  ones.  In  the  column 

S  C 

labelled  “this  work,”  ^  and  «  denote  constructions  producing  public  keys  A  that  are  statistically  close  to 
uniform,  and  computationally  pseudorandom,  respectively.  (All  quality  terms  s  and  length  bounds  13  omit  the 
same  statistical  “smoothing”  factor  for  Z,  which  is  about  4-5  in  practice.) 


To  illustrate  the  kinds  of  concrete  improvements  that  our  methods  provide,  in  Figure  [2]  we  give  rep¬ 
resentative  parameters  for  the  canonical  application  of  GPV  sigantures  | GPV08] ,  comparing  the  old  and 
new  trapdoor  constructions  for  nearly  equal  levels  of  concrete  security.  We  stress  that  these  parameters  are 
not  highly  optimized,  and  making  adjustments  to  some  of  the  tunable  parameters  in  our  constructions  may 
provide  better  combinations  of  efficiency  and  concrete  security.  We  leave  this  effort  for  future  work. 


1.2  Techniques 

The  main  idea  behind  our  new  method  of  trapdoor  generation  is  as  follows.  Instead  of  building  a  random 
matrix  A  through  some  specialized  and  complex  process,  we  start  from  a  carefully  crafted  public  matrix  G 
(and  its  associated  lattice),  for  which  the  associated  functions  /g  and  go,  admit  very  efficient  (in  both 
sequential  and  parallel  complexity)  and  high-quality  inversion  algorithms.  In  particular,  preimage  sampling 
for  /g  and  inversion  for  c/g  can  be  performed  in  essentially  ()(n  log  n)  sequential  time,  and  can  even  be 
performed  by  n  parallel  0(logn)-time  operations  or  table  lookups.  (This  should  be  compared  with  the 
general  algorithms  for  these  tasks,  which  require  at  least  quadratic  il(rr  log2  n)  time,  and  are  not  always 
parallelizable  for  optimal  noise  parameters.)  We  emphasize  that  G  is  not  a  cryptographic  key,  but  rather  a 
fixed  and  public  matrix  that  may  be  used  by  all  parties,  so  the  implementation  of  all  its  associated  operations 
can  be  highly  optimized,  in  both  software  and  hardware.  We  also  mention  that  the  simplest  and  most 
practically  efficient  choices  of  G  work  for  a  modulus  q  that  is  a  power  of  a  small  prime,  such  as  q  =  2k,  but  a 
crucial  search/decision  reduction  for  LWE  was  not  previously  known  for  such  q ,  despite  its  obvious  practical 
utility.  In  Section [3]  we  provide  a  very  general  reduction  that  covers  this  case  and  others,  and  subsumes  all  of 
the  known  (and  incomparable)  search/decision  reductions  for  LWE  lfBFKL931  Reg05 1  [Pei09b  1 I ACPS09 1 . 

To  generate  a  random  matrix  A  with  a  trapdoor,  we  take  two  additional  steps:  first,  we  extend  G 
into  a  semi-random  matrix  A'  =  [A  |  G],  for  uniform  A  £  Z”xm  and  sufficiently  large  m.  (As  shown 
in  ICHKP101,  inversion  of  and  preimage  sampling  for  /a1  reduce  very  efficiently  to  the  corresponding 
tasks  for  ga  and  /g-)  Finally,  we  simply  apply  to  A7  a  certain  random  unimodular  transformation  defined  by 
the  matrix  T  =  [  J  ^  ] ,  for  a  random  “short”  secret  matrix  R  that  will  serve  as  the  trapdoor,  to  obtain 


A  =  A7  •  T  =  [A  |  G-AR], 
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Figure  2:  Representative  parameters  for  GPV  signatures  (using  fast  inversion  algorithms)  for  the  old  and  new 
trapdoor  generation  methods.  Using  the  methodology  from  HMR09B.  both  sets  of  parameters  have  security 
level  corresponding  to  a  parameter  5  of  at  most  1.007,  which  is  estimated  to  require  about  246  core-years 
on  a  64-bit  1.86GFIz  Xeon  using  the  state-of-the-art  in  lattice  basis  reduction  HGN08IICN11I1.  We  use  a 
smoothing  parameter  of  r  =  4.5  for  Z,  which  corresponds  to  statistical  error  of  less  than  2-90  for  each 
randomized-rounding  operation  during  signing.  Key  sizes  are  calculated  using  the  Flermite  normal  form 
optimization.  Key  sizes  for  ring-based  GPV  signatures  are  approximated  to  be  smaller  by  a  factor  of  about 
0.9n. 

The  transformation  given  by  T  has  the  following  properties: 

•  It  is  very  easy  to  compute  and  invert,  requiring  essentially  just  one  multiplication  by  R  in  both  cases. 
(Note  that  T-1  = 

•  It  results  in  a  matrix  A  that  is  distributed  essentially  uniformly  at  random,  as  required  by  the  security 
reductions  (and  worst-case  hardness  proofs)  for  lattice-based  cryptographic  schemes. 

•  For  the  resulting  functions  /a  and  qa,  preimage  sampling  and  inversion  very  simply  and  efficiently 
reduce  to  the  corresponding  tasks  for  /g,  Qg-  The  overhead  of  the  reduction  is  essentially  just  a  single 
matrix- vector  product  with  the  secret  matrix  R  (which,  when  inverting  /a,  can  largely  be  precomputed 
even  before  the  target  value  is  known). 

As  a  result,  the  cost  of  the  inversion  operations  ends  up  being  very  close  to  that  of  computing  /a  and  qa  in  the 
forward  direction.  Moreover,  the  fact  that  the  running  time  is  dominated  by  matrix-vector  multiplications  with 
the  fixed  trapdoor  matrix  R  yields  theoretical  (but  asymptotically  significant)  improvements  in  the  context 
of  batch  execution  of  several  operations  relative  to  the  same  secret  key  R:  instead  of  evaluating  several 
products  Rzi,  Rz2, . . . ,  Rzn  individually  at  a  total  cost  of  0(n3),  one  can  employ  fast  matrix  multiplication 
techniques  to  evaluate  R[zi, . . . ,  zn]  as  a  whole  is  subcubic  time.  Batch  operations  can  be  exploited  in 
applications  like  the  multi-bit  IBE  of  1GPV081  and  its  extensions  to  HIBE  [CHKPlOi i ABB  10a;  ABB  1  Obi. 

Related  techniques.  At  the  surface,  our  trapdoor  generator  appears  similar  to  the  original  “GGH”  approach 
of  IIGGH97II  for  generating  a  lattice  together  with  a  short  basis.  That  technique  works  by  choosing  some 
random  short  vectors  as  the  secret  “good  basis”  of  a  lattice,  and  then  transforms  them  into  a  public  “bad  basis” 
for  the  same  lattice,  via  a  unimodular  matrix  having  large  entries.  (Note,  though,  that  this  does  not  produce 
a  lattice  from  Ajtai’s  worst-case-hard  family.)  A  closer  look  reveals,  however,  that  (worst-case  hardness 
aside)  our  method  is  actually  not  an  instance  of  the  GGH  paradigm:  here  the  initial  short  basis  of  the  lattice 
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defined  by  G  (or  the  semi-random  matrix  [A|G])  is  fixed  and  public,  while  the  random  unimodular  matrix 
T  =  ~jR  ]  actually  produces  a  new  lattice  by  applying  a  (reversible)  linear  transformation  to  the  original 
lattice.  In  other  words,  in  contrast  with  GGH  we  multiply  a  (short)  unimodular  matrix  on  the  “other  side”  of 
the  original  short  basis,  thus  changing  the  lattice  it  generates. 

A  more  appropriate  comparison  is  to  Ajtai’s  original  method  [Ajt96|  for  generating  a  random  A  together 
with  a  “weak”  trapdoor  of  one  or  more  short  lattice  vectors  (but  not  a  full  basis).  There,  one  simply  chooses  a 
semi-random  matrix  A7  =  [A  |  0]  and  outputs  A  =  A'  ■  T  =  [A  |  —  AR],  with  short  vectors  [R] .  Perhaps 
surprisingly,  our  strong  trapdoor  generator  is  just  a  simple  twist  on  Ajtai’s  original  weak  generator,  replacing 
0  with  the  gadget  G. 

Our  constructions  and  inversion  algorithms  also  draw  upon  several  other  techniques  from  throughout  the 
literature.  The  trapdoor  basis  generator  of  HAP091  and  the  LWE-based  “lossy”  injective  trapdoor  function 
of  llPW08l  both  use  a  fixed  “gadget”  matrix  analogous  to  G,  whose  entries  grow  geometrically  in  a  structured 
way.  In  both  cases,  the  gadget  is  concealed  (either  statistically  or  computationally)  in  the  public  key  by 
a  small  combination  of  uniformly  random  vectors.  Our  method  for  adding  tags  to  the  trapdoor  is  very 
similar  to  a  technique  for  doing  the  same  with  the  lossy  TDF  of  flPW08l.  and  is  identical  to  the  method  used 
in  I  ABB  10a!  for  constructing  compact  (H)IBE.  Finally,  in  our  preimage  sampling  algorithm  for  /a,  we  use 
the  “convolution”  technique  from  BPeilOfl  to  correct  for  some  statistical  skew  that  arises  when  converting 
preimages  for  /g  to  preimages  for  /a,  which  would  otherwise  leak  information  about  the  trapdoor  R. 


1.3  Applications 

Our  improved  trapdoor  generator  and  inversion  algorithms  can  be  plugged  into  any  scheme  that  uses  such  tools 
as  a  “black  box,”  and  the  resulting  scheme  will  inherit  all  the  efficiency  improvements.  (Every  application 
we  know  of  admits  such  a  black-box  replacement.)  Moreover,  the  special  properties  of  our  methods  allow 
for  further  improvements  to  the  design,  efficiency,  and  security  reductions  of  existing  schemes.  Here  we 
summarize  some  representative  improvements  that  are  possible  to  obtain;  see  Section[6]for  complete  details. 

Hash-and-sign  digital  signatures.  Our  construction  and  supporting  algorithms  plug  directly  into  the  “full 
domain  hash”  signature  scheme  of  BGPV0811,  which  is  strongly  unforgeable  in  the  random  oracle  model,  with 
a  tight  security  reduction.  One  can  even  use  our  computationally  secure  trapdoor  generator  to  obtain  a  smaller 
public  verification  key,  though  at  the  cost  of  a  hardness-of-LWE  assumption,  and  a  somewhat  stronger  SIS 
assumption  (which  affects  concrete  security).  Determining  the  right  balance  between  key  size  and  security  is 
left  for  later  work. 

In  the  standard  model,  there  are  two  closely  related  types  of  hash-and-sign  signature  schemes: 

•  The  one  of  ICHKPIO).  which  has  signatures  of  bit  length  0(n2),  and  is  existentially  unforgeable  (later 
improved  to  be  strongly  unforgeable  llRuclOt)  assuming  the  hardness  of  inverting  /a  with  solution 
length  bounded  by  /5  =  0(nL5)|^] 

•  The  scheme  of  |BoylO|,  a  lattice  analogue  of  the  pairing-based  signature  of  HWat05ll.  which  has 
signatures  of  bit  length  0(n)  and  is  existentially  unforgeable  assuming  the  hardness  of  inverting  /a 
with  solution  length  bounded  by  (3  =  0(n3,5). 

We  improve  the  latter  scheme  in  several  ways,  by:  (i)  improving  the  length  bound  to  /3  =  0(n2  5);  (77)  reducing 
the  online  runtime  of  the  signing  algorithm  from  0(n3)  to  0(n2)  via  chameleon  hashing  IIKR00I:  ( Hi)  making 
the  scheme  strongly  unforgeable  a  la  IGPV08.!Rucl01:  (iv)  giving  a  tighter  and  simpler  security  reduction 

2  All  parameters  in  this  discussion  assume  a  message  length  of  0(n)  bits. 
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(using  a  variant  of  the  “prefix  technique”  [HW091  as  in  [CHKPlOl),  where  the  reduction’s  advantage  degrades 
only  linearly  in  the  number  of  signature  queries;  and  (v)  removing  all  additional  constraints  on  the  parameters 
n  and  q  (aside  from  those  needed  to  ensure  hardness  of  the  SIS  problem).  We  stress  that  the  scheme  itself 
is  essentially  the  same  (up  to  the  improved  and  generalized  parameters,  and  chameleon  hashing)  as  that 
of  [Boy  10 1 ;  only  the  security  proof  and  underlying  assumption  are  improved.  Note  that  in  comparison 
with  [CHKPlOl,  there  is  still  a  trade-off  between  the  bit  length  of  the  signatures  and  the  bound  (3  in  the 
underlying  SIS  assumption;  this  appears  to  be  inherent  to  the  style  of  the  security  reduction.  Note  also  that  the 
public  keys  in  all  of  these  schemes  are  still  rather  large  at  0(n3)  bits  (or  0(n2)  bits  using  the  ring  analogue 
of  SIS),  so  they  are  still  mainly  of  theoretical  interest.  Improving  the  key  sizes  of  standard-model  signatures 
is  an  important  open  problem. 


Chosen  ciphertext- secure  encryption.  We  give  a  new  construction  of  CCA-secure  public -key  encryption  (in  the 
standard  model)  from  the  learning  with  errors  (LWE)  problem  with  error  rate  a  =  1/ poly(n),  where  larger  a 
corresponds  to  a  harder  concrete  problem.  Existing  schemes  exhibit  various  incomparable  tradeoffs  between 
key  size  and  error  rate.  The  first  such  scheme  is  due  to  llPW08l :  it  has  public  keys  of  size  0(n2)  bits  (with 
somewhat  large  hidden  factors)  and  relies  on  a  quite  small  LWE  error  rate  of  a  =  0(l/n4).  The  next  scheme, 
from  rtPei09bfl.  has  larger  public  keys  of  0(n3)  bits,  but  uses  a  better  error  rate  of  a  =  0(l/n).  Finally,  using 
the  generic  conversion  from  selectively  secure  ID-based  encryption  to  CCA-secure  encryption  HBCHK07H. 
one  can  obtain  from  llABBlOal  a  scheme  having  key  size  0(n 2)  bits  and  using  error  rate  a  =  O ( 1  /n2 ) . 
(Here  decryption  is  randomized,  since  the  IBE  key-derivation  algorithm  is.)  In  particular,  the  public  key  of 
the  scheme  from  H ABB  1  Obi  consists  of  3  matrices  in  Z™ xm  where  m  is  large  enough  to  embed  a  (strong) 
trapdoor,  plus  essentially  one  vector  in  Z”  per  message  bit. 

We  give  a  CCA-secure  system  that  enjoys  the  best  of  all  prior  constructions,  which  has  0(n2)- bit  public 
keys,  uses  error  rate  a  =  0(l/n )  (both  with  small  hidden  factors),  and  has  deterministic  decryption.  To 
achieve  this,  we  need  to  go  beyond  just  plugging  our  improved  trapdoor  generator  as  a  black  box  into  prior 
constructions.  Our  scheme  relies  on  the  particular  structure  of  the  trapdoor  instances;  in  effect,  we  directly 
construct  a  “tag-based  adaptive  trapdoor  function”  IIKMOIOII.  The  public  key  consists  of  only  1  matrix  with 
an  embedded  (strong)  trapdoor,  rather  than  3  as  in  the  most  compact  scheme  to  date  (ABB  IQal ;  moreover, 
we  can  encrypt  up  to  n  log  q  message  bits  per  ciphertext  without  needing  any  additional  public  key  material. 
Combining  these  design  changes  with  the  improved  dimension  of  our  trapdoor  generator,  we  obtain  more  than 
a  7.5-fold  improvement  in  the  public  key  size  as  compared  with  I  ABBIOal.  (This  figure  does  not  account  for 
removing  the  extra  public  key  material  for  the  message  bits,  nor  the  other  parameter  improvements  implied 
by  our  weaker  concrete  LWE  assumption,  which  would  shrink  the  keys  even  further.) 


(Hierarchical)  identity-based  encryption.  Just  as  with  signatures,  our  constructions  plug  directly  into  the 
random-oracle  IBE  of  HGPV08H.  In  the  standard-model  depth-d  hierarchical  IBEs  of  | CHRP  10  ABBIOal. 
our  techniques  can  shrink  the  public  parameters  by  an  additional  factor  of  about  €  [3, 4],  relative  to 
just  plugging  our  improved  trapdoor  generator  as  a  “black  box”  into  the  schemes.  This  is  because  for  each 
level  of  the  hierarchy,  the  public  parameters  only  need  to  contain  one  matrix  of  the  same  dimension  as  G 
(i.e.,  about  nlgq),  rather  than  two  full  trapdoor  matrices  (of  dimension  about  2nlgq  each)0 Because  the 
adaptation  is  straightforward  given  the  tools  developed  in  this  work,  we  omit  the  details. 

3  We  note  that  in  IPei09al  (an  earlier  version  of  ICHKPIOI)  the  schemes  are  defined  in  a  similar  way  using  lower-dimensional 
extensions,  rather  than  full  trapdoor  matrices  at  each  level. 
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1.4  Other  Related  Work 


Concrete  parameter  settings  for  a  variety  “strong”  trapdoor  applications  are  given  in  BRS10I.  Those  parameters 
are  derived  using  the  previous  suboptimal  generator  of  IAP091.  and  using  the  methods  from  this  work  would 
yield  substantial  improvements.  The  recent  work  of  IlLPl  11  also  gives  improved  key  sizes  and  concrete 
security  for  LWE-based  cryptosystems;  however,  that  work  deals  only  with  IND-CPA-secure  encryption, 
and  not  at  all  with  strong  trapdoors  or  the  further  applications  they  enable  (CCA  security,  digital  signatures, 
(H)IBE,  etc.). 


2  Preliminaries 


We  denote  the  real  numbers  by  M  and  the  integers  by  Z.  For  a  nonnegative  integer  A;,  we  let  [A;]  =  {1, . . . ,  A;}. 
Vectors  are  denoted  by  lower-case  bold  letters  (e.g.,  x)  and  are  always  in  column  form  (xA  is  a  row  vector). 
We  denote  matrices  by  upper-case  bold  letters,  and  treat  a  matrix  X  interchangeably  with  its  ordered  set 
{xi,  X2, . . .}  of  column  vectors.  For  convenience,  we  sometimes  use  a  scalar  s  to  refer  to  the  scaled  identity 
matrix  si,  where  the  dimension  will  be  clear  from  context. 

The  statistical  distance  between  two  distributions  X,  Y  over  a  finite  or  countable  domain  D  is  A(X,  Y)  = 
\  YlweD  I  A(iu)  —  Y (u;)l-  Statistical  distance  is  a  metric,  and  in  particular  obeys  the  triangle  inequality.  We 
say  that  a  distribution  over  D  is  e-uniform  if  its  statistical  distance  from  the  uniform  distribution  is  at  most  e. 

Throughout  the  paper,  we  use  a  “randomized-rounding  parameter”  r  that  we  let  be  a  fixed  function 
r(n )  =  u>(y/ log n)  growing  asymptotically  faster  than  yTogn.  By  “fixed  function”  we  mean  that  r  = 
cc(\/logn)  always  refers  to  the  very  same  function,  and  no  other  factors  will  be  absorbed  into  the  u(-) 
notation.  This  allows  us  to  keep  track  of  the  precise  multiplicative  constants  introduced  by  our  constructions. 
Concretely,  we  take  r  ~  yj\n.(2  /  e)  /  it  where  e  is  a  desired  bound  on  the  statistical  error  introduced  by 
each  randomized-rounding  operation  for  Z,  because  the  error  is  bounded  by  «  2  exp(— itr2)  according  to 
Lemma  2.3  below.  For  example,  for  e  =  2~54  we  have  r  <  3.5,  and  for  e  =  2~!l  we  have  r  <  4. 


2.1  Linear  Algebra 

A  unimodular  matrix  U  E  ZmXm  is  one  for  which  |det(U)|  =  1;  in  particular,  U_1  E  Zmxm  as  well.  The 
Gram-Schmidt  ortho gonalization  of  an  ordered  set  of  vectors  V  =  {vi, . . . ,  v/, }  E  R",  is  V  =  {vi, . . . ,  v/,.} 
where  v,  is  the  component  of  v,  orthogonal  to  span(vi, . . . ,  v,_i)  for  allf  =  1, . . . ,  k.  (In  some  cases  we 
orthogonalize  the  vectors  in  a  different  order.)  In  matrix  form,  V  =  QDU  for  some  orthogonal  Q  E  Mnxfc, 
diagonal  D  E  Mfcx/'  with  nonnegative  entries,  and  upper  unitriangular  U  E  Mfcxfc  (i.e.,  U  is  upper  triangular 
with  Is  on  the  diagonal).  The  decomposition  is  unique  when  the  v;  are  linearly  independent,  and  we  always 
have  || ^  ||  =  the  ith  diagonal  entry  of  D. 

For  any  basis  V  =  {vi, . . . ,  vn}  of  Mn,  its  origin-centered  parallelepiped  is  defined  as  Vi/2(V)  = 
V  ■  [— i,  i)n.  Its  dual  basis  is  defined  as  V*  =  V_t  =  (V-1)4.  If  we  orthogonalize  V  and  V*  in  forward 
and  reverse  order,  respectively,  then  we  have  v*  =  v, / 1 1  v,  1 1 2  for  all  i.  In  particular,  1 1  v*  1 1  =  l/||vt||. 

For  any  square  real  matrix  X,  the  (Moore-Penrose)  pseudoinverse,  denoted  X+,  is  the  unique  matrix 
satisfying  (XX+)X  =  X,  X+(XX+)  =  X+,  and  such  that  both  XX+  and  X+X  are  symmetric.  We 
always  have  span(X)  =  span(X+),  and  when  X  is  invertible,  we  have  X+  =  X  ~ 1 . 

A  symmetric  matrix  S  E  Mnxn  is  positive  definite  (respectively,  positive  .vc/n/dc finite),  written  S  >  0 
(resp.,  S  >  0),  if  xTx  >  0  (resp.,  xTx  >  0)  for  all  nonzero  x  E  Mn.  We  have  S  >  0  if  and  only  if  E 
is  invertible  and  E-1  >  0,  and  E  >  0  if  and  only  if  E+  >  0.  Positive  (semi)definiteness  defines  a  partial 
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ordering  on  symmetric  matrices:  we  say  that  Si  >  S2  if  (Si  —  S2)  >  0,  and  similarly  for  Si  >  S2.  We 
have  Si  >  S2  >  0  if  and  only  if  S+  >  S+  >  0,  and  likewise  for  the  analogous  strict  inequalities. 

For  any  matrix  B,  the  symmetric  matrix  S  =  BB*  is  positive  semidefinite,  because 

x*Ex  =  (B*x,  Bfx)  =  ||B*x||2  >  0 

for  any  nonzero  x  G  Mn,  where  the  inequality  is  always  strict  if  and  only  if  B  is  nonsingular.  We  say  that 
B  is  a  square  root  of  S  >  0,  written  B  =  VS,  if  BB4  =  S.  Every  S  >  0  has  a  square  root,  which  can  be 
computed  efficiently,  e.g.,  via  the  Cholesky  decomposition. 

For  any  matrix  B  G  Mnxfc,  there  exists  a  singular  value  decomposition  B  =  QDP\  where  Q  G  Mnxn, 
P  G  Mfcxfc  are  orthogonal  matrices,  and  D  G  M.nxk  is  a  diagonal  matrix  with  nonnegative  entries  st  >  0  on 
the  diagonal,  in  non-increasing  order.  The  are  called  the  singular  values  of  B.  Under  this  convention,  D  is 
uniquely  determined  (though  Q,  P  may  not  be),  and  si(B)  =  maxu||Bu||  =  maxu||Bfu||  >  ||B||,  ||B*||, 
where  the  maxima  are  taken  over  all  unit  vectors  u  G  M.k. 


2.2  Lattices  and  Hard  Problems 


Generally  defined,  an  m-dimensional  lattice  A  is  a  discrete  additive  subgroup  of  Mm.  For  some  k  <  m,  called 
the  rank  of  the  lattice,  A  is  generated  as  the  set  of  all  Z-linear  combinations  of  some  k  linearly  independent 
basis  vectors  B  =  {bi, . . . ,  b^},  i.e.,  A  =  {Bz  :  z  G  Zfc}.  In  this  work,  we  are  mostly  concerned  with 
full-rank  integer  lattices,  i.e.,  A  C  Zm  with  k  =  m.  (We  work  with  non-full-rank  lattices  only  in  the  analysis 
of  our  Gaussian  sampling  algorithm  in  Section  5.4  )  The  dual  lattice  A*  is  the  set  of  all  v  G  span(A)  such 
that  (v,  x)  G  Z  for  every  x  G  A.  If  B  is  a  basis  of  A,  then  B*  =  B(B*B)  1  is  a  basis  of  A*.  Note  that  when 
A  is  full-rank,  B  is  invertible  and  hence  B*  =  B  l. 

Many  cryptographic  applications  use  a  particular  family  of  so-called  q-ary  integer  lattices,  which  contain 
qZm  as  a  sublattice  for  some  (typically  small)  integer  q.  For  positive  integers  n  and  q,  let  A  G  Z”xr"  be 
arbitrary  and  define  the  following  full-rank  m-dimensional  q-ary  lattices: 


AJ_(A)  =  {z  G  Zm  :  Az  =  0  mod  q} 

A(A4)  =  {z  G  Zm  :  3  s  G  Z”  s.t.  z  =  Afs  mod  q}. 

It  is  easy  to  check  that  A2- (A)  and  A(A*)  are  dual  lattices,  up  to  a  q  scaling  factor:  q  •  A^(A)*  =  A(A4), 
and  vice-versa.  For  this  reason,  it  is  sometimes  more  natural  to  consider  the  non-integral,  “1-ary”  lattice 
f  A(A4)  =  A-1  (A)*  D  Zm.  For  any  u  G  Z™  admitting  an  integral  solution  to  Ax  =  u  mod  q,  define  the 
coset  (or  “shifted”  lattice) 


A„  (A)  =  {zG  Zm  :  Az  =  u  mod  q}  =  A_L(A)  +  x. 


Here  we  recall  some  basic  facts  about  these  q-ary  lattices. 

Lemma  2.1.  Let  A  G  Z”xm  be  arbitrary  and  let  S  G  Zmxm  be  any  basis  o/AJ~(A). 

1.  For  any  unimodular  T  G  Zmxm,  we  have  T  •  A_L(A)  =  A^(A  ■  T-1),  with  T  ■  S  as  a  basis. 

2.  I  ABB  I  ()a  implicit]  For  any  invertible  H  G  Z”xrt,  we  have  A  '  (H  •  A)  =  A_L(A). 

3.  K'HKFIO  Lemma  3.2]  Suppose  that  the  columns  of  A  generate  all  of  Z",  let  A 1  G  Z™ xm  be  arbitrary, 
and  let  W  G  Zm x rn  be  an  arbitrary  solution  to  AW  =  —A'  mod  q.  Then  S'  =  [  -^  §  ]  is  a  basis  of 
A-1- ([A7  |  A]),  and  when  orthogonalized  in  appropriate  order,  S'  =  [J|],  In  particular,  1 1  S'  1 1  =  ||S||. 
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Cryptographic  problems.  For  ft  >  0,  the  short  integer  solution  problem  SIS9j(g  is  an  average-case  version 
of  the  approximate  shortest  vector  problem  on  A~(A).  The  problem  is:  given  uniformly  random  A  G  Z”Xm 
for  any  desired  m  =  poly(n),  find  a  relatively  short  nonzero  z  G  A  (A),  i.e.,  output  a  nonzero  z  G  Zm  such 
that  Az  =  0  mod  q  and  ||z||  <  ft.  When  q  >  ft \Jn ■  uj ( \/log  n ) ,  solving  this  problem  (with  any  non-negligible 
probability  over  the  random  choice  of  A)  is  at  least  as  hard  as  (probabilistically)  approximating  the  Shortest 
Independent  Vectors  Problem  (SIVP,  a  classic  problem  in  the  computational  study  of  point  lattices  [MG02P 
on  n-dimensional  lattices  to  within  0(fty/n)  factors  in  the  worst  case  [lAjt96[ lM R04l iGPVOSll. 

For  a  >  0,  the  learning  with  errors  problem  LWE,jjQ,  may  be  seen  an  average-case  version  of  the 
bounded-distance  decoding  problem  on  the  dual  lattice  ^A(A4).  Let  T  =  M/Z,  the  additive  group  of 
reals  modulo  1,  and  let  Da  denote  the  Gaussian  probability  distribution  over  M  with  parameter  a  (see 
Section  2.3  below).  For  any  fixed  s  G  Z”,  define  ASjQ,  to  be  the  distribution  over  ZJ  x  I  obtained  by 
choosing  a  -t—  Z”  uniformly  at  random,  choosing  e  -c—  Da,  and  outputting  (a,  b  =  (a,  s )/q  +  e  mod  1). 
The  search- LWE^q,  problem  is:  given  any  desired  number  m  =  poly(n)  of  independent  samples  from  ,4S,0 
for  some  arbitrary  s,  find  s.  The  decision- LWEg  Cf  problem  is  to  distinguish,  with  non-negligible  advantage, 
between  samples  from  As,a  for  uniformly  random  sGZJ,  and  uniformly  random  samples  from  Z”  x  T. 
There  are  a  variety  of  (incomparable)  search/decision  reductions  for  LWE  under  certain  conditions  on  the 
parameters  (e.g.,  | Reg05[  iPei09bl  'AC PS 091):  in  Section [3]  we  give  a  reduction  that  essentially  subsumes 
them  all.  When  q  >  2^/n/a,  solving  search- LWEgiQ  is  at  least  as  hard  as  quantumly  approximating  SIVP 
on  n-dimensional  lattices  to  within  0(n/a )  factors  in  the  worst  case  [Reg05].  For  a  restricted  range  of 
parameters  (e.g.,  when  q  is  exponentially  large)  a  classical  (non-quantum)  reduction  is  also  known  IIPei09bl. 
but  only  from  a  potentially  easier  class  of  problems  like  the  decisional  Shortest  Vector  Problem  (GapSVP) 
and  the  Bounded  Distance  Decoding  Problem  (BDD)  (see  IILM09H  ). 

Note  that  the  m  samples  (a*,  bft  and  underlying  error  terms  et  from  AS  Q  may  be  grouped  into  a  matrix 
A  G  Zgxm  and  vectors  b  G  Tm,  e  G  Mm  in  the  natural  way,  so  that  b  =  (A's)/q  +  e  mod  1.  In  this  way,  b 
may  be  seen  as  an  element  of  A-1  (A)*  =  perturbed  by  Gaussian  error.  By  scaling  b  and  discretizing 

its  entries  using  a  form  of  randomized  rounding  (see  UPei  101).  we  can  convert  it  into  IT  =  A*s  +  o’  mod  q 
where  e'  G  Zm  has  discrete  Gaussian  distribution  with  parameter  (say)  \J~2aq. 


2.3  Gaussians  and  Lattices 

The  n-dimensional  Gaussian  function  p  :  — >  (0, 1]  is  defined  as 

p(x)  =  exp(  — 7T  •  ||x||2)  =  exp(  — 7T  •  (x,  x)). 


Applying  a  linear  transformation  given  by  a  (not  necessarily  square)  matrix  B  with  linearly  independent 
columns  yields  the  (possibly  degenerate)  Gaussian  function 

A  I  p(B+x)  =  exp  (-7T  ■  xfE+x)  if  x  G  span(B)  =  span(E) 

PB(K)  =  \o  otherwise 

where  T,  =  BBf  >  0.  Because  pg  is  distinguished  only  up  to  E,  we  usually  refer  to  it  as  p^- 

Normalizing  by  its  total  measure  over  span(E),  we  obtain  the  probability  distribution  function  of 
the  (continuous)  Gaussian  distribution  D By  linearity  of  expectation,  this  distribution  has  covariance 
Exf-D^[x-xt]  =  (The  J-  factor  is  the  variance  of  the  Gaussian  I)  \ ,  due  to  our  choice  of  normalization.) 
For  convenience,  we  implicitly  ignore  the  ^  factor,  and  refer  to  E  as  the  covariance  matrix  of  D^. 
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Let  A  C  M"  be  a  lattice,  let  c  £  Mn,  and  let  E  >  0  be  a  positive  semidefinite  matrix  such  that 
(A  +  c)  n  span(E)  is  nonempty.  The  discrete  Gaussian  distribution  DA+c  ^  is  simply  the  Gaussian 
distribution  D restricted  to  have  support  A  +  c.  That  is,  for  all  x  £  A  +  c, 


TO  (  \  I  \ 

°A+o,vs(*)  =  pMa  +  c)  k  'VsW- 

We  recall  the  definition  of  the  smoothing  parameter  from  I.MR04I .  generalized  to  non-spherical  (and 
potentially  degenerate)  Gaussians.  It  is  easy  to  see  that  the  definition  is  consistent  with  the  partial  ordering  of 
positive  semidefinite  matrices,  i.e.,  if  Si  >  S2  >  r/f  (A),  then  Si  >  t/£(A). 

Definition  2.2.  Let  S  >  0  and  A  C  span(S)  be  a  lattice.  We  say  that  \/S  >  pe( A)  if  p^+{ A*)  <  1  +  e. 

The  following  is  a  bound  on  the  smoothing  parameter  in  terms  of  any  orthogonalized  basis.  Note  that  for 
practical  choices  like  n  <  2 14  and  e  >  2-80,  the  multiplicative  factor  attached  to  ||B||  is  bounded  by  4.6. 

Lemma  2.3  (iGPVOS:,  Theorem  3.1]).  Let  A  C  Mn  be  a  lattice  with  basis  B,  and  let  e  >  0  .We  have 

rje{ A)  <  || B ||  ■  v/ln(2n(l  +  l/e))/7r. 

In  particular,  for  any  uj ( \/log  n)  function,  there  is  a  negligible  e(n)  for  which  pe(A)  <  ||B||  ■  cu(\/log  n). 

For  appropriate  parameters,  the  smoothing  parameter  of  a  random  lattice  A^  ( A)  is  small,  with  very  high 
probability.  The  following  bound  is  a  refinement  and  strengthening  of  one  from  I1GPV08H.  which  allows  for  a 
more  precise  analysis  of  the  parameters  and  statistical  errors  involved  in  our  constructions. 

Lemma  2.4.  Let  n,  m,  q  >  2  be  positive  integers.  For  s  £  Z”,  let  the  subgroup  Gs  =  {(a,  s)  :  a  £  Z" }  C 
Zq,  and  let  gs  =  |GS|  =  q/  gcd(si, . . .  ,sn,  q).  Let  e  >  0,  q  >  r/e(Zm),  and  s  >  p  be  reals.  Then  for 
uniformly  random  A  £  Z” xm, 


E 

A  L; 


Pt/S(A±(A)*)  <  (1  +  e)  max{l /gSlp/s}m. 


(2.1) 


seZ" 


In  particular,  if  q  =  pe  is  a  power  of  a  prime  p,  and 

log (3  +  2/e)  n  log  q  +  log (2  +  2/e) 


m  >  max  <  n  + 


logp 


log (s/p) 


(2.2) 


then  Ea  [pi/s(AJ-(A)*)]  <  l+2e,  and  so  by  Markov’s  inequality,  s  >  P2e/s(^  (A))  except  with  probability 
at  most  5. 


Proof.  We  will  use  the  fact  (which  follows  from  the  Poisson  summation  formula;  see  IIMR041  Lemma  2.8]) 
that  pt{ A)  <  pr( A)  <  ( r/t)m  ■  pt(A)  for  any  rank-m  lattice  A  and  r  >  t  >  0. 

For  any  A  £  Z” xm,  one  can  check  that  AJ-(A)*  =  Zm  +  {Ats/q  :  s  £  Z™}.  Note  that  A*s  is  uniformly 
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random  over  G"' ,  for  uniformly  random  A.  Then  we  have 


E 

A 


Pi/^Ar 


< 


^E[Pl/s(Z“  +  AVg)] 

sezj 


=  J29sm-Pi/s(9s1^rn) 

sGZ" 


<  £  9sm  •  max{l, gsr]/ s}m  ■  Pl/v(Zm), 

seZ" 


<  (1  +  e)  max{l/ps ,h/s}m, 
seZ" 


(lin.  of  E) 
(avg.  over  A) 
(above  fact) 
(V  >  rk(^m))- 


To  prove  the  second  paid  of  the  claim,  observe  that  gs  =  p'  for  some  i  >  0,  and  that  there  are  at  most  gn 
values  of  s  for  which  gs  =  g,  because  each  entry  of  s  must  be  in  Gs.  Therefore, 


£  W  <  £/n~m) 

s£Z™  i>  0 


1 


1-p 


n—m 


<  1  + 


e 

2(1  +  e) 


(More  generally,  for  arbitrary  q  we  have  )GS  1/g™  <  Q{rn  —  n),  where  £(•)  is  the  Riemann  zeta  function.) 
Similarly,  (v/s)m  =  qn(s/rj)~m  <  2(i+e)  ’  anc*  t^ie  c^aim  f°U°ws-  □ 

We  need  a  number  of  standard  facts  about  discrete  Gaussians. 


Lemma  2.5  (IMR04I  Lemmas  2.9  and  4.1]).  Let  A  C  W1  be  a  lattice.  For  any  E  >  0  and  c  E  Mn, 
we  have  p^( A  +  c)  <  p^(A).  Moreover,  if  \/E  >  rje(A)  for  some  e  >  0  and  c  E  span(A),  then 

+  c)  —  T+f  ‘  Py/zW- 

Combining  the  above  lemma  with  a  bound  of  Banaszczyk  [Ban93],  we  have  the  following  tail  bound  on 
discrete  Gaussians. 


Lemma  2.6  (HBan931  Lemma  1.5]).  Let  A  C  Mn  be  a  lattice  and  r  >  rp(A)  for  some  e  E  (0, 1).  For  any 
c  E  span  (A),  we  have 

Pr  [||L»A+c,r||  >  rs/n]  <  2~n  ■  £±|. 

Moreover,  if  c  =  0  then  the  bound  holds  for  any  r  >  0,  with  e  =  0. 

The  next  lemma  bounds  the  predictability  (i.e.,  probability  of  the  most  likely  outcome  or  equivalently, 
min-entropy)  of  a  discrete  Gaussian. 

Lemma  2.7  ( IPR061  Lemma  2.1 1]).  Let  Ac  W1  be  a  lattice  and  r  >  2rje{A)for  some  e  E  (0, 1).  For  any 
c  E  Mn  and  any  y  E  A  +  c ,  we  have  Pr[PA+c,r  =  y]  <  2~n  ■ 


2.4  Subgaussian  Distributions  and  Random  Matrices 

For  d  >  0,  we  say  that  a  random  variable  X  (or  its  distribution)  over  M  is  5-subgaussian  with  parameter 
s  >  0  if  for  all  t  E  M,  the  (scaled)  moment-generating  function  satisfies 

E  [exp(27rfA)]  <  exp(<5)  •  exp(7rs2f2). 
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Notice  that  the  exp(7r.s2f2)  term  on  the  right  is  precisely  the  (scaled)  moment- generating  function  of  the 
Gaussian  distribution  Ds.  So,  our  definition  differs  from  the  usual  definition  of  subgaussian  only  in  the 
additional  factor  of  exp(5);  we  need  this  relaxation  when  working  with  discrete  Gaussians,  usually  taking 
5  =  ln(j^)  ~  2e  for  the  same  small  e  as  in  the  smoothing  parameter  r/£. 

If  X  is  5-subgaussian,  then  its  tails  are  dominated  by  a  Gaussian  of  parameter  s,  i.e.,  Pr  [|X|  >  t]  < 
2exp(5)  exp(— nt2 /s2)  for  all  t  >  ()j^j  This  follows  by  Markov’s  inequality:  by  scaling  X  we  can  assume 
s  =  1,  and  we  have 

Pr[X  >  t]  =  Pr[exp(27rfX)  >  exp(27rf2)]  <  exp(5)  exp(7rf2)/ exp(27rf2)  =  exp(5)  exp(— 7rf2). 

The  claim  follows  by  repeating  the  argument  with  —X,  and  the  union  bound.  Using  the  Taylor  series 
expansion  of  exp(2-7rtX),  it  can  be  shown  that  any  /i-bounded  symmetric  random  variable  X  (i.e.,  X  <  B 
always)  is  O-subgaussian  with  parameter  B^/2tt. 

More  generally,  we  say  that  a  random  vector  x  or  its  distribution  (respectively,  a  random  matrix  X)  is  5- 
subgaussian  (of  parameter  s)  if  all  its  one-dimensional  marginals  (u,  v)  (respectively,  u*Xv)  for  unit  vectors 
u,  v  are  5-subgaussian  (of  parameter  s).  It  follows  immediately  from  the  definition  that  the  concatenation  of 
independent  5,-subgaussian  vectors  with  common  parameter  s,  interpreted  as  either  a  vector  or  matrix,  is 
(S  5,;)-subgaussian  with  parameter  s. 

Lemma  2.8.  Let  A  C  Mn  be  a  lattice  and  s  >  rje(A)for  some  0  <  e  <  1.  For  any  c  G  span(A),  Z?a+c,s  is 
In ( yzf ) -subgaussian  with  parameter  s.  Moreover,  it  is  O-subgaussian  for  any  s  >  0  when  c  =  0. 

Proof.  By  scaling  A  we  can  assume  that  s  =  1.  Let  x  have  distribution  D\+c,  and  let  u  G  Mn  be  any  unit 
vector.  We  bound  the  scaled  moment-generating  function  of  the  marginal  (x,  u)  for  any  t  G  M: 

p(A  +  c)  •  E  [exp(27r(x,  tu))]  =  E  exp(— 7r((x,  x)  -  2(x,tu))) 

xEA+c 

=  exp(7rt2)  •  ^2  exp(— 7r(x  —  tu,  x  —  tu)) 

xEA+c 

=  exp(7rf2)  •  p(A  +  c  —  fu). 


Both  claims  then  follow  by  Lemma [23]  □ 

Here  we  recall  a  standard  result  from  the  non-asymptotic  theory  of  random  matrices;  for  further  details, 
see  I  Veriri.  (The  proof  for  5-subgaussian  distributions  is  a  trivial  adaptation  of  the  O-subgaussian  case.) 

Lemma  2.9.  Let  X  G  Mnxm  be  a  5-subgaussian  random  matrix  with  parameter  s.  There  exists  a  universal 
constant  C  >  0  such  that  for  any  t  >  0,  we  have  si  (X)  <  C  ■  s  ■  (y/m  +  y/n  +  t )  except  with  probability  at 
most  2  exp(5)  exp(— 7rt2). 

Empirically,  for  discrete  Gaussians  the  universal  constant  C  in  the  above  lemma  is  very  close  to  1/ \/27r- 
In  fact,  it  has  been  proved  that  C  <  1  /\/2tt  for  matrices  with  independent  identically  distributed  continuous 
Gaussian  entries. 

4The  converse  also  holds  (up  to  a  small  constant  factor  in  the  parameter  s)  when  E[X]  =  0,  but  this  will  frequently  not  quite  be 
the  case  in  our  applications,  which  is  why  we  define  subgaussian  in  terms  of  the  moment-generating  function. 
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3  Search  to  Decision  Reduction 


Here  we  give  a  new  search-to-decision  reduction  for  LWE  that  essentially  subsumes  all  of  the  (incomparable) 
prior  ones  given  in  HBFKL931  Reg05  IPei09bl  IACPS09I  Most  notably,  it  handles  moduli  q  that  were  not 
covered  before,  specifically,  those  like  q  =  2k  that  are  divisible  by  powers  of  very  small  primes.  The  only 
known  reduction  that  ours  does  not  subsume  is  a  different  style  of  sample-preserving  reduction  recently  given 
in  I  M.V1  111,  which  works  for  a  more  limited  class  of  moduli  and  error  distributions;  extending  that  reduction 
to  the  full  range  of  parameters  considered  here  is  an  interesting  open  problem.  In  what  follows,  oj(\/log  n) 
denotes  some  fixed  function  that  grows  faster  than  \/log  n,  asymptotically. 


Theorem  3.1.  Let  q  have  prime  factorization  q  =  pf  ■  ■  ■  ptk  for  pairwise  distinct  poly  (n) -bounded  primes  pi 
with  each  e*  >  1,  and  let  0  <  a  <  \/ui(y/\ogn).  Let  (  be  the  number  of  prime  factors  pi  <  u(y/  log  n)/a. 
There  is  a  probabilistic  polynomial-time  reduction  from  solving  search-E\NEl/n  (in  the  worst  case,  with 
overwhelming  probability)  to  solving  decision-\fNEqai  (on  the  average,  with  non-negligible  advantage)  for 
any  a'  >  a  such  that  a1  >  u(\/\ogn) / pef  for  every  i,  and  (a')£  >  a  ■  Lu(y/logn)1+e. 

For  example,  when  every  pi  >  co ( \/log  n) /o  we  have  l  =  0,  and  any  a'  >  a  is  acceptable.  (This  special 
case,  with  the  additional  constraint  that  every  e*  =  1,  is  proved  in  li Pei 09b I.)  As  a  qualitatively  new  example, 
when  q  =  pe  is  a  prime  power  for  some  (possibly  small)  prime  p,  then  it  suffices  to  let  o'  >  a  ■  u(sJ\og  n)  2. 
(A  similar  special  case  where  q  =  pe  for  sufficiently  large  p  and  a'  =  a  <C  1/p  is  proved  in  I  ACPS091 . ) 


Proof  We  show  how  to  recover  each  entry  of  s  modulo  a  large  enough  power  of  each  pi,  given  access  to  the 
distribution  AS  Q,  for  some  s  €  Z™  and  to  an  oracle  O  solving  DLWE,;  0/.  For  the  parameters  in  the  theorem 
statement,  we  can  then  recover  the  remainder  of  s  in  polynomial  time  by  rounding  and  standard  Gaussian 
elimination. 

First,  observe  that  we  can  transform  As  a  into  As>ai  simply  by  adding  (modulo  1)  an  independent  sample 
from  D pa,2_a2  to  the  second  component  of  each  (a,  b  =  (a,  s)/q  +  Da  mod  1)  e  Z™  x  T  drawn  from  ASiCt. 

We  now  show  how  to  recover  each  entry  of  s  modulo  (powers  of)  any  prime  p  =  pi  dividing  q.  Let 
e  =  ei,  and  for  j  =  0, 1, . . . ,  e  define  A3s  a,  to  be  the  distribution  over  ZJ  xT  obtained  by  drawing 
(a,  b)  <r-  and  outputting  (a,  b  +  r/jh  mod  1)  for  a  fresh  uniformly  random  r  4—  7Lq.  (Clearly,  this 
distribution  can  be  generated  efficiently  from  AS  Q/.)  Note  that  when  a'  >  u(\J\og n) /p!  >  //f  ((l/p:/)Z) 
for  some  e  =  negl(n),  A3aa,  is  negligibly  far  from  U  =  f/(Z"  x  T),  and  this  holds  at  least  for  j  =  e 
by  hypothesis.  Therefore,  by  a  hybrid  argument  there  exists  some  minimal  j  £  [e]  for  which  O  has  a 
non-negligible  advantage  in  distinguishing  between  .4'^.  !  and  AJs  a,,  over  a  random  choice  of  s  and  all  other 
randomness  in  the  experiment.  (This  j  can  be  found  efficiently  by  measuring  the  behavior  of  Of  Note  that 
when  pi  >  a,' (\/ log  n)  jo.  >  u{y/\ogn) / ot ,  the  minimal  j  must  be  1;  otherwise  it  may  be  larger,  but  there 
are  at  most  i  of  these  by  hypothesis.  Now  by  a  standard  random  self-reduction  and  amplification  techniques 
(e.g.,  |  Reg05|,  Lemma  4.1]),  we  can  in  fact  assume  that  O  accepts  (respectively,  rejects)  with  overwhelming 


probability  given  A 3_  \  (resp..  A3  ,),  for  any  s  G  Z' 


s,a‘ 


s,a' 


qm 


-l 


Given  access  to  A3~ 7  and  O,  we  can  test  whether  si  =  0  mod  p  by  invoking  O  on  samples  from  /L 
that  have  been  transformed  as  follows  (all  of  what  follows  is  analogous  for  S2,  ■  ■  ■ ,  sn ):  take  each  sample 

(a ,b=  (a,  s )/q  +  e  +  r/qf~l  mod  1)  X—  AJS~^,  to 


(a7  =  a  —  r  ■  (q/p3)  •  ei  ,  b'  =  b  =  (a7,  s)/q  +  e  +  (pr  +  r'sfj/p3  mod  1)  (3.1) 

5We  say  “essentially  subsumes”  because  our  reduction  is  not  very  meaningful  when  q  is  itself  a  very  small  prime,  whereas  those 
°f  IBFKL93I |RegQ5j  are  meaningful.  This  is  only  because  our  reduction  deals  with  the  continuous  version  of  LWE.  If  we  discretize 
the  problem,  then  for  very  small  prime  q  our  reduction  specializes  to  those  of  lBFKL93~I|Reg05|. 
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for  a  fresh  r'  -t—  7Lq  (where  ei  =  (f ,  0, . . . ,  0)  £  Z”).  Observe  that  if  si  =  0  mod  p,  the  transformed 
samples  are  also  drawn  from  /F /J ,  otherwise  they  are  drawn  from  ,  because  r' s\  is  uniformly  random 
modulo  p.  Therefore,  O  tells  us  which  is  the  case. 

Using  the  above  test,  we  can  efficiently  recover  si  mod  p  by  ‘shifting’ s \  by  each  of  0, . . . ,  p  —  1  mod  p 
using  the  standard  transformation  that  maps  AS)Ce/  to  As+t,a'  for  any  desired  t  £  Zq,  by  taking  (a,  b ) 
to  (a,  b  +  (a,  t)/q  mod  1).  (This  enumeration  step  is  where  we  use  the  fact  that  every  pi  is  poly(n)- 
bounded.)  Moreover,  we  can  iteratively  recover  .S|  mod  p1 . . . .  ,pe_-7+1  as  follows:  having  recovered 
si  mod  pl,  first  ‘shift’  As  ai  to  Asi  at  where  s\  =  0  mod  pl,  then  apply  a  similar  procedure  as  above  to 
recover  s^  mod  pz+1:  specifically,  just  modify  the  transformation  in  (|3.1[)  to  let  a6 7  =  a  —  r'  •  (q/pi+z)  •  ei, 
so  that  b'  =  b  =  (a',  s )/q  +  e  +  ( pr  +  r'  (s,1/pz))  /pi .  This  procedure  works  as  long  as  p>+z  divides  q,  so  we 
can  recover  si  mod  pe-J+1. 

Using  the  above  reductions  and  the  Chinese  remainder  theorem,  and  letting  ji  be  the  above  minimal  value 
of  j  for  p  =  pi  (of  which  at  most  i  of  these  are  greater  than  f ),  from  AS:0l  we  can  recover  s  modulo 


P  = 


Up 


,ei— (ii— 1) 


q/UA  1  ~q' 


a 


uj(y/\ogn) 


>q  -  a-  w(y/log  n), 


because  a1  <  uj(y/\ogn)  /  pPf^1  for  all  i  by  definition  of  ;j,  and  by  hypothesis  on  a'.  By  applying  the  ‘shift’ 
transformation  to  ,4s  o  we  can  assume  that  s  =  0  mod  P.  Now  every  (a,  s') /q  is  an  integer  multiple  of 
P/q  >  a  ■  Lu(  \/log  n),  and  since  every  noise  term  e  4—  Da  has  magnitude  <  (a/2)  •  co(y/log  n)  with 
overwhelming  probability,  we  can  round  the  second  component  of  every  (a,  b)  -t—  AStCt  to  the  exact  value  of 
(a,  s)/q  mod  1.  From  these  we  can  solve  for  s  by  Gaussian  elimination,  and  we  are  done.  □ 


4  Primitive  Lattices 

At  the  heart  of  our  new  trapdoor  generation  algorithm  (described  in  Section[5]l  is  the  construction  of  a  very 
special  family  of  lattices  which  have  excellent  geometric  properties,  and  admit  very  fast  and  parallelizable 
decoding  algorithms.  The  lattices  are  defined  by  means  of  what  we  call  a  primitive  matrix.  We  say  that  a 
matrix  G  £  Z”xm  is  primitive  if  its  columns  generate  all  of  Z/,  i.e.,  G  •  Zm  =  Z/j^] 

The  main  results  of  this  section  are  summarized  in  the  following  theorem. 

Theorem  4.1.  For  any  integers  q  >  2,  n  >  1,  k  =  |dog2  (/]  and  m  =  nk,  there  is  a  primitive  matrix 
G  £  Z”xm  such  that 

•  The  lattice  AJ_(G)  has  a  known  basis  S  £  Zmxm  with  ||S||  <  \/5  and  ||S||  <  max{\/5 ,Vk}. 
Moreover,  when  q  =  2k,  we  have  S  =  21  (so  ||S||  =  2)  and  ||S||  =  \/5. 

•  Both  G  and  S  require  little  storage.  In  particular,  they  are  sparse  (with  only  0(m )  nonzero  entries ) 
and  highly  structured. 

•  Inverting  Pg(s>  e)  :=  S*G  +  c1  mod  q  can  be  performed  in  quasilinear  0(n  ■  logc  n)  time  for  any 
s  £  Z/  and  any  e  £  V\/f  q  ■  B  '),  where  B  can  denote  either  S  or  S.  Moreover,  the  algorithm  is 
perfectly  parallelizable,  running  in  polylogarithmic  0(logcn)  time  using  n  processors.  When  q  =  2k, 
the  polylogarithmic  term  0(logc  n )  is  essentially  just  the  cost  ofk  additions  and  shifts  on  k-bit  integers. 

6We  do  not  say  that  G  is  “full-rank,”  because  is  not  a  field  when  q  is  not  prime,  and  the  notion  of  rank  for  matrices  over  Z9  is 
not  well  defined. 
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•  Preimage  sampling  for  /g(x)  =  Gx  mod  q  with  Gaussian  parameter  s  >  ||S||  •  u>(y/ log  n)  can 
be  performed  in  quasilinear  0(n  ■  log c  n)  time,  or  parallel  polylogarithmic  0(\ogcn)  time  using  n 
processors.  When  q  =  2k,  the  polylogarithmic  term  is  essentially  just  the  cost  ofk  additions  and  shifts 
on  k-bit  integers,  plus  the  (offline)  generation  of  about  m  random  integers  drawn  from  Dw^s. 

More  generally,  for  any  integer  b  >  2,  all  of  the  above  statements  hold  with  k  =  [~logbq~|,  ||S||  <  y/b2  +  1, 
and  ||S||  <  max{V&2  +  1,  ( b  —  l)y/k};  and  when  q  =  bk,  we  have  S  =  61  and  ||S||  =  \/b 2  +  1. 

The  rest  of  this  section  is  dedicated  to  the  proof  of  Theorem |4. 1|  In  the  process,  we  also  make  several 
important  observations  regarding  the  implementation  of  the  inversion  and  sampling  algorithms  associated 
with  G,  showing  that  our  algorithms  are  not  just  asymptotically  fast,  but  also  quite  practical. 

Let  q  >  2  be  an  integer  modulus  and  k  >  1  be  an  integer  dimension.  Our  construction  starts  with  a 
primitive  vector  g  G  Zk,  i.e.,  a  vector  such  that  gcd(cq  , . . . .  g\..  q)  =  1.  The  vector  g  defines  a  fc-dimensional 
lattice  A^(g2)  C  Zfc  having  determinant  |Zfc/AJ-(g<)|  =  q,  because  the  residue  classes  of  Zfc/AJ-(gt)  are 
in  bijective  correspondence  with  the  possible  values  of  (g,  x)  mod  q  for  x  G  Zfc,  which  cover  all  of  7Lq 
since  g  is  primitive.  Concrete  primitive  vectors  g  will  be  described  in  the  next  subsections.  Notice  that 
when  q  =  poly(n),  we  have  k  =  O(logq)  =  O(logn)  and  so  AJ~(gt)  is  a  very  low-dimensional  lattice.  Let 
Sfc  G  Zfcxfc  be  a  basis  of  A±(g*),  that  is,  g*  •  S/~  =  0  G  Z*xfc  and  |det(Sfc)|  =  q. 

The  primitive  vector  g  and  associated  basis  S/,:  are  used  to  define  the  parity-check  matrix  G  and  basis 
S  G  Zg  as  G  :=  In  <g>  g*  G  Z£xnfc  and  S  :=  In  <g>  Sfc  G  Znkxnk.  That  is, 


’•  •  •  g*  •  •  • 

1 - 

-se 

CD 

. .  .  gi  .  .  . 

G  Z”xnfc,  S  :  = 

sk 

1 

00. 

i_! _ 

- 1 

C/D 

_ 1 

Equivalently,  G,  AJ-(G),  and  S  are  the  direct  sums  of  n  copies  of  g4,  AJ~(g<),  and  S/,.,  respectively.  It  follows 
that  G  is  a  primitive  matrix,  the  lattice  A^  (G)  C  Z"fc  has  determinant  qn,  and  S  is  a  basis  for  this  lattice.  It 
also  follows  (and  is  clear  by  inspection)  that  ||S||  =  ||Sfc||  and  ||S||  =  ||Sfc||. 

By  this  direct  sum  construction,  it  is  immediate  that  inverting  gc(s ,  e)  and  sampling  preimages  of 
/g(x)  can  be  accomplished  by  performing  the  same  operations  n  times  in  parallel  for  ggt  and  fgi  on  the 
corresponding  portions  of  the  input,  and  concatenating  the  results.  For  preimage  sampling,  if  each  of  the  /gt 
preimages  has  Gaussian  parameter  y/T,,  then  by  independence,  their  concatenation  has  parameter  In  (g>  v/H 
Likewise,  inverting  get  will  succeed  whenever  all  the  n  independent  ggt  -inversion  subproblems  are  solved 
correctly. 

In  the  next  two  subsections  we  study  concrete  instantiations  of  the  primitive  vector  g,  and  give  optimized 
algorithms  for  inverting  g„i  and  sampling  preimages  for  /gt.  In  both  subsections,  we  consider  primitive 
lattices  AJ-(gt)  C  T,k  defined  by  the  vector 

g*:=[l  2  4  •••  2fc_1]  G  ZgXfc,  A:=riog2gl,  (4.1) 

whose  entries  form  a  geometrically  increasing  sequence.  (We  focus  on  powers  of  2,  but  all  our  results 
trivially  extend  to  other  integer  powers,  or  even  mixed-integer  products.)  The  only  difference  between 
the  two  subsections  is  in  the  form  of  the  modulus  q.  We  first  study  the  case  when  the  modulus  q  =  2k 
is  a  power  of  2,  which  leads  to  especially  simple  and  fast  algorithms.  Then  we  discuss  how  the  results 
can  be  generalized  to  arbitrary  moduli  q.  Notice  that  in  both  cases,  the  syndrome  (g,  x)  G  Z5  of  a  binary 
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vector  x  =  (xo, . . . ,  Xk~  \ )  E  {0,  1 } k  is  just  the  positive  integer  with  binary  expansion  x.  In  general,  for 
arbitrary  x  E  Zfc  the  syndrome  (g,  x)  E  Z q  can  be  computed  very  efficiently  by  a  sequence  of  k  additions 
and  binary  shifts,  and  a  single  reduction  modulo  q,  which  is  also  trivial  when  q  =  2k  is  a  power  of  2.  The 
syndrome  computation  is  also  easily  parallelizable,  leading  to  O(logfc)  =  (9 (log  log  n)  computation  time 
using  0(k)  =  O(logra)  processors. 


4.1  Power-of-Two  Modulus 

Let  q  =  2k  be  a  power  of  2,  and  let  g  be  the  geometric  vector  defined  in  Equation  (|4. 1  [).  Define  the  matrix 


Sfc 


2 

-1  2 
-1 


E  Z 


kxk 


2 

-1  2 


This  is  a  basis  for  AA-(gt),  because  gt  ■  Sk  =  0  mod  q  and  det(Sfc)  =  2k  =  q.  Clearly,  all  the  basis  vectors 
are  short.  Moreover,  by  orthogonalizing  S/,.  in  reverse  order,  we  have  S/.  =  2  ■  I/..  This  construction  is 
summarized  in  the  following  proposition.  (It  generalizes  in  the  obvious  way  to  any  integer  base,  not  just  2.) 

Proposition  4.2.  For  q  =  2k  and  g  =  (1,  2, . . . ,  2k~1)  E  hk,  the  lattice  A±(g<)  has  a  basis  S  such  that 
S  =  21  and  ||S||  <  y/b.  In  particular,  Ve( A±(g*))  <2r  =  2  ■  u(y/\ogn)  for  some  e(n)  =  negl(n). 

Using  Proposition  |4.2| and  known  generic  algorithms  ltBab85  ,  Kle001lGPV081.  it  is  possible  to  invert 
pgt(s,  e)  correctly  whenever  e  E  V\ ^((q/F)  ■  I),  and  sample  preimages  under  fgt  with  Gaussian  parameter 
s  >  2r  =  2  •  uj(y/logn).  In  what  follows  we  show  how  the  special  structure  of  the  basis  S  leads  to  simpler, 
faster,  and  more  practical  solutions  to  these  general  lattice  problems. 


Inversion.  Here  we  show  how  to  efficiently  find  an  unknown  scalar  s  E  Zg  given  b*  =  [6o,  b\, . . . ,  bk- i]  = 
s  ■  gl  +  et  =  [s  +  eo,  2s  +  ei, . . . ,  2 k~1s  +  e^_ i]  mod  q,  where  e  E  Zfc  is  a  short  emor  vector. 

An  iterative  algorithm  works  by  recovering  the  binary  digits  so,  si, . . . ,  Sk- i  G  {0, 1}  of  s  E  Z9,  from 
least  to  most  significant,  as  follows:  first,  determine  so  by  testing  whether 

bk- 1  =  2k~1s  +  e-k-i  =  (g/2)so  +  ek-i  mod  q 

is  closer  to  0  or  to  q/2  (modulo  q).  Then  recover  si  from  bk- 2  =  2 k~2s  +  ek- 2  =  2k~1s\  +  2k~2sq  + 
ek-2  mod  q,  by  subtracting  2/,:  -.so  and  testing  proximity  to  0  or  q/2,  etc.  It  is  easy  to  see  that  the  algorithm 
produces  correct  output  if  every  e*  E  [—  |),  i.e.,  if  e  E  •  1^/2)  =  V\ /2{q  •  (S^.)- *).  It  can  also  be 

seen  that  this  algorithm  is  exactly  Bahai’s  “nearest-plane”  algorithm  HBab85l.  specialized  to  the  scaled  dual 
q(Sk)~t  of  the  basis  S k  of  AJ-(gi),  which  is  a  basis  for  A(g). 

Formally,  the  iterative  algorithm  is:  given  a  vector  Ik  =  [60,  •  •  • ,  bk- 1]  E  Z,' x ,  initialize  s  •(—  0. 

1.  For  i  =  k  —  1, . . . ,  0:  let  s  E-  s  +  2k~1~l  ■  [6,.  —  2*  •  s  0  [— |)  mod  q] ,  where  [E]  =  1  if  expression 

E  is  true,  and  0  otherwise.  Also  let  a  4—  bi  —  2l  ■  s  E  [—  |). 

2.  Output  sEZ?  and  e  =  (e0, . . .  ,ek-\)  E  \)k  C  Zfc. 
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Note  that  for  x  G  {0, . . .  ,q  —  1}  with  binary  representation  ( Xk-iXk-2  ■  ■  •  £0)2,  we  have 

[x  0  [-§,  f)  m°d  ®  Xfc-2- 

There  is  also  a  non-iterative  approach  to  decoding  using  a  lookup  table,  and  a  hybrid  approach  between 
the  two  extremes.  Notice  that  rounding  each  entry  bi  of  b  to  the  nearest  multiple  of  2*  (modulo  q,  breaking 
ties  upward)  before  running  the  above  algorithm  does  not  change  the  value  of  s  that  is  computed.  This  lets 
us  precompute  a  lookup  table  that  maps  the  2fe(fc+1)/2  =  possible  rounded  values  of  b  to  the  correct 

values  of  s.  The  size  of  this  table  grows  very  rapidly  for  k  >  3,  but  in  this  case  we  can  do  better  if  we  assume 
slightly  smaller  error  terms  e,  G  [— |,  |) :  simply  round  each  bi  to  the  nearest  multiple  of  max{|,  2*},  thus 
producing  one  of  exactly  8k~  1  =  r/3 / 8  possible  results,  whose  solutions  can  be  stored  in  a  lookup  table.  Note 
that  the  result  is  correct,  because  in  each  coordinate  the  total  error  introduced  by  e,;  and  rounding  to  a  multiple 
of  |  is  in  the  range  [— |).  A  hybrid  approach  combining  the  iterative  algorithm  with  table  lookups  of  i 
bits  of  s  at  a  time  is  potentially  the  most  efficient  option  in  practice,  and  is  easy  to  devise  from  the  above 
discussion. 

Gaussian  sampling.  We  now  consider  the  preimage  sampling  problem  for  function  /gt,  i.e.,  the  task  of 
Gaussian  sampling  over  a  desired  coset  of  A1-  (gL).  More  specifically,  we  want  to  sample  a  vector  from  the 
set  A^(gf)  =  {x  <E  Zfc  :  (g,  x)  =  u  mod  q\  for  a  desired  syndrome  u  G  Zg,  with  probability  proportional 
to  ps(x).  We  wish  to  do  so  for  any  fixed  Gaussian  parameter  s  >  ||Sfc||  •  r  =  2  •  cu(y/log  n),  which  is  an 
optimal  bound  on  the  smoothing  parameter  of  A1-  (G). 

As  with  inversion,  there  are  two  main  approaches  to  Gaussian  sampling,  which  are  actually  opposite 
extremes  on  a  spectrum  of  storage/parallelism  trade-offs.  The  first  approach  is  essentially  to  precompute 
and  store  many  independent  samples  x  G-  I)jk  s,  ‘bucketing’  them  based  on  the  value  of  (g,  x)  G  Zg  until 
there  is  at  least  one  sample  per  bucket.  Because  each  (g,  x)  is  statistically  close  to  uniform  over  7Lq  (by  the 
smoothing  parameter  bound  for  A±(gt)),  a  coupon-collecting  argument  implies  that  we  need  to  generate 
about  q  log  q  samples  to  occupy  every  bucket.  The  online  part  of  the  sampling  algorithm  for  AJ-(gf)  is  trivial, 
merely  taking  a  fresh  x  from  the  appropriate  bucket.  The  downside  is  that  the  storage  and  precomputation 
requirements  are  rather  high:  in  many  applications,  q  (while  polynomial  in  the  security  parameter)  can  be  in 
the  many  thousands  or  more. 

The  second  approach  exploits  the  niceness  of  the  orthogonalized  basis  S&  =  21/,..  Using  this  basis,  the 
randomized  nearest-plane  algorithm  of  BKleOOl  1GPV08  ]  becomes  very  simple  and  efficient,  and  is  equivalent 
to  the  following:  given  a  syndrome  u  G  {0, . . . ,  q  —  1}  (viewed  as  an  integer), 

1.  For  i  =  0, . . . ,  k  —  1:  choose  X{  «—  D-2z+u,s  and  let  u  <—  (u  —  Xi)/ 2  G  Z. 

2.  Output  x  =  (xo, . . . ,  Xk- 1). 

Observe  that  every  Gaussian  xt  in  the  above  algorithm  is  chosen  from  one  of  only  two  possible  cosets  of  2Z, 
determined  by  the  least  significant  bit  of  u  at  that  moment.  Therefore,  we  may  precompute  and  store  several 
independent  Gaussian  samples  from  each  of  2Z  and  2Z  + 1,  and  consume  one  per  iteration  when  executing  the 
algorithm.  (As  above,  the  individual  samples  may  be  generated  by  choosing  several  x  <—  D%tS  and  bucketing 
each  one  according  to  its  least-significant  bit.)  Such  presampling  makes  the  algorithm  deterministic  during 
its  online  phase,  and  because  there  are  only  two  cosets,  there  is  almost  no  wasted  storage  or  precomputation. 
Notice,  however,  that  this  algorithm  requires  k  =  lg(q)  sequential  iterations. 

Between  the  extremes  of  the  two  algorithms  described  above,  there  is  a  hybrid  algorithm  that  chooses 
l  >  1  entries  of  x  at  a  time.  (For  simplicity,  we  assume  that  l  divides  k  exactly,  though  this  is  not 
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strictly  necessary.)  Let  h*  =  [1,  2, . . . ,  2£  ’]  £  Z**^  be  a  parity-check  matrix  defining  the  2^-ary  lattice 
A-  ('ll')  C  Ze,  and  observe  that  gt  =  [hf,  2f  ■  hA  . . . ,  I1'"1  ■  h*].  The  hybrid  algorithm  then  works  as  follows: 

1.  Fori  =  0, . . . ,  k/l—  1,  choose  (xa, . . . ,  X(i+iy_i)  «—  DA±  ,ht\  s  and  let  u  «—  (u  —  x)/2e,  where 

X  =  Y%=0  %ii+j  ■  G  Z. 

2.  Output  x  =  (xo,  ■  ■  ■ ,  Xk~i). 

As  above,  we  can  precompute  samples  x  I)j/  s  and  store  them  in  a  lookup  table  having  2f:  buckets, 
indexed  by  the  value  (h,  x)  G  Z2i,  thereby  making  the  algorithm  deterministic  in  its  online  phase. 


4.2  Arbitrary  Modulus 


For  a  modulus  q  that  is  not  a  power  of  2,  most  of  the  above  ideas  still  work,  with  slight  adaptations.  Let 
k  =  |4g(g)] ,  so  q  <  2k.  As  above,  define  g*  :=  [1,2,...,  2fc~1]  G  Z*xfc,  but  now  define  the  matrix 


Sfc  := 


2 

-1 


qo 

qi 

Q2 


G  Z 


kxk 


2  qk-2 
-1  qk- 1 


where  (go,  •  •  • ,  qk- 1)  £  {0,  l}fc  is  the  binary  expansion  of  q  =  2*  •  qt.  Again,  S  is  a  basis  of  A±(gt) 

because  g*  •  =  0  mod  q,  and  det(S/;:)  =  q.  Moreover,  the  basis  vectors  have  squared  length  ||s,  ||2  =  5 
for  i  <  k  and  js/;. || J  =  JA  q,  <  k.  The  next  lemma  shows  that  S/,.  also  has  a  good  Gram-Schmidt 
orthogonalization. 

Lemma  4.3.  With  S  =  S/,:  defined  as  above  and  orthogonalized  in  forward  order,  we  have  ||s)||2  =  |  [  '  G 
(4,  5]  for  1  <  i  <  k,  and  ||sfc||2  =  <  3. 

Proof  Notice  that  the  the  vectors  si, . . . ,  s/._ i  are  all  orthogonal  to  g/,.  =  (1,2,4,...,  2k  1 )  G  Zfc.  Thus, 
the  orthogonal  component  of  s/,:  has  squared  length 


I  —  1 1 2  _  (sfc,gfc)5 

S/c  — 


Q 


3q2 


.4? 


list'll2  ^2j<k 

Similarly,  the  squared  length  of  s \  for  i  <  k  can  be  computed  as 

_ _ _ _ _  9  4i  4-4"* 

Si  2  =  1  + 


4k  —  1' 


Ej<i^  i-4- 


□ 


This  concludes  the  description  and  analysis  of  the  primitive  lattice  A±(gt)  when  q  is  not  a  power 
of  2.  Specialized  inversion  algorithms  can  also  be  adapted  as  well,  but  some  care  is  needed.  Of  course, 
since  the  lattice  dimension  k  =  O(logn)  is  very  small,  one  could  simply  use  the  general  methods  of 
[Bab85llKle00i  IGPV08llPeil0fl  without  worrying  too  much  about  optimizations,  and  satisfy  all  the  claims 
made  in  Theorem|4.1|  Below  we  briefly  discuss  alternatives  for  Gaussian  sampling. 
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The  offline  ‘bucketing’  approach  to  Gaussian  sampling  works  without  any  modification  for  arbitrary 
modulus,  with  just  slighly  larger  Gaussian  parameter  s  >  \/5  ■  r,  because  it  relies  only  on  the  smoothing 
parameter  bound  of  r/e( A1(gt))  <  jS/,.  |  •  oj(\/log  n)  and  the  fact  that  the  number  of  buckets  is  q.  The 
randomized  nearest-plane  approach  to  sampling  does  not  admit  a  specialization  as  simple  as  the  one  we  have 
described  for  q  =  2k.  The  reason  is  that  while  the  basis  S  is  sparse,  its  orthogonalization  S  is  not  sparse  in 
general.  (This  is  in  contrast  to  the  case  when  q  =  2k,  for  which  orthogonalizing  in  reverse  order  leads  to 
the  sparse  matrix  S  =  21.)  Still,  S  is  “almost  triangular,”  in  the  sense  that  the  off-diagonal  entries  decrease 
geometrically  as  one  moves  away  from  the  diagonal.  This  may  allow  for  optimizing  the  sampling  algorithm 
by  performig  “truncated”  scalar  product  computations,  and  still  obtain  an  almost-Gaussian  distribution  on  the 
resulting  samples.  An  interesting  alternative  is  to  use  a  hybrid  approach,  where  one  first  performs  a  single 
iteration  of  randomized  nearest-plane  algorithm  to  take  care  of  the  last  basis  vector  s^,  and  then  performs 
some  variant  of  the  convolution  algorithm  from  UPei  1  Oil  to  deal  with  the  first  k  —  1  basis  vectors  [si, . . . ,  s/;_|], 
which  have  very  small  lengths  and  singular  values.  Notice  that  the  orthogonalized  component  of  the  last 
vector  Sfc  is  simply  a  scalar  multiple  of  the  primitive  vector  g,  so  the  scalar  product  (sfc,  t)  (for  any  vector  t 
with  syndrome  u  =  (g,  t})  can  be  immediately  computed  from  u  as  u/ q  (see  Lemma  4.3 1. 


4.3  The  Ring  Setting 

The  above  constructions  and  algorithms  all  transfer  easily  to  compact  lattices  defined  over  polynomial  rings 
(i.e.,  number  rings),  as  used  in  the  representative  works  llMic02l  |PR06l  ILM061 1LPR101 .  A  commonly  used 
example  is  the  cyclomotic  ring  R  =  Z [x] / (<b,n(x))  where  <E>m(x)  denotes  the  mth  cyclotomic  polynomial, 
which  is  a  monic,  dcgrec-cp(m),  irreducible  polynomial  whose  zeros  are  all  the  primitive  mth  roots  of  unity 
in  C.  The  ring  R  is  a  Z-module  of  rank  n,  i.e.,  it  is  generated  as  the  additive  integer  combinations  of  the 
“power  basis”  elements  1,  x,  x2, . . . ,  We  let  Rq  =  R/qR,  the  ring  modulo  the  ideal  generated  by  an 

integer  q.  For  geometric  concepts  like  error  vectors  and  Gaussian  distributions,  it  is  usually  nicest  to  work 
with  the  “canonical  embedding”  of  R,  which  roughly  (but  not  exactly)  corresponds  with  the  “coefficient 
embedding,”  which  just  considers  the  vector  of  coefficients  relative  to  the  power  basis. 

Let  g  £  Rk  be  a  primitive  vector  modulo  q,  i.e.,  one  for  which  the  ideal  generated  by  q,  g\..  .  .  ,  <p.  is  the 
full  ring  R.  As  above,  the  vector  g  defines  functions  fgt :  Rk  H>  Ilq  and  ggt :  Rq  x  Rk  — >  Rqxk,  defined  as 
fgt  (x)  =  (g,  x)  =  Yli=\  9i  •  xi  mod  q  and  ggt  (s,  e)  =  s  •  g4  +  e4  mod  q,  and  the  related  f?-module 

qRk  C  A1(gt)  :=  {x  £  Rk  :  /gt(x)  =  (g,x)  =  0  mod  q}  C  Rk, 

which  has  index  (determinant)  qn  =  \Rq\  as  an  additive  subgroup  of  Rk  because  g  is  primitive.  Concretely, 
we  can  use  the  exact  same  primitive  vector  g4  =  [1,  2, . . . ,  2k  1  ]  £  Rk  as  in  Equation  (|4. 1  [),  interpreting  its 
entries  in  the  ring  Rq  rather  than  Zf/. 

Inversion  and  preimage  sampling  algorithms  for  ggt  and  /gt  (respectively)  are  relatively  straightforward 
to  obtain,  by  adapting  the  basic  approaches  from  the  previous  subsections.  These  algorithms  are  simplest 
when  the  power  basis  elements  1,  x,  x2, . . . ,  x Am)  - 1  aj-e  orthogonal  under  the  canonical  embedding  (which 
is  the  case  exactly  when  m  is  a  power  of  2,  and  hence  =  xm/2  +  1),  because  the  inversion  operations 

reduce  to  parallel  operations  relative  to  each  of  the  power  basis  elements.  We  defer  the  details  to  the  full 
version. 
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5  Trapdoor  Generation  and  Operations 


In  this  section  we  describe  our  new  trapdoor  generation,  inversion  and  sampling  algorithms  for  hard  random 
lattices.  Recall  that  these  are  lattices  A^(A)  defined  by  an  (almost)  uniformly  random  matrix  A  G  Z”xm, 
and  that  the  standard  notion  of  a  “strong”  trapdoor  for  these  lattices  (put  forward  in  HGPV081  and  used 
in  a  large  number  of  subsequent  applications)  is  a  short  lattice  basis  S  G  Zmxm  for  A  ■  (A).  There  are 
several  measures  of  quality  for  the  trapdoor  S,  the  most  common  ones  being  (in  nondecreasing  order): 
the  maximal  Gram-Schmidt  length  ||S||;  the  maximal  Euclidean  length  ||S||;  and  the  maximal  singular 
value  si(S).  Algorithms  for  generating  random  lattices  together  with  high-quality  trapdoor  bases  are  given 
in  [Ajt99[  |AP09t.  In  this  section  we  give  much  simpler,  faster  and  tighter  algorithms  to  generate  a  hard 
random  lattice  with  a  trapdoor,  and  to  use  a  trapdoor  for  performing  standard  tasks  like  inverting  the  LWE 
function  g a  and  sampling  preimages  for  the  SIS  function  /a-  We  also  give  a  new,  simple  algorithm  for 
delegating  a  trapdoor,  i.e.,  using  a  trapdoor  for  A  to  obtain  one  for  a  matrix  [A  |  A']  that  extends  A,  in  a 
secure  and  non-reversible  way. 

The  following  theorem  summarizes  the  main  results  of  this  section.  Here  we  state  just  one  typical 
instantiation  with  only  asymptotic  bounds.  More  general  results  and  exact  bounds  are  presented  throughout 
the  section. 


Theorem  5.1.  There  is  an  efficient  randomized  algorithm  GenTrapfl".  lm,  q)  that,  given  any  integers  n  >  1, 
q  >  2,  and  sufficiently  large  m  =  0(n  log  q),  outputs  a  parity-check  matrix  A  G  Z('xm  and  a  ‘trapdoor’  R 
such  that  the  distribution  of  A  is  negl  {n)-far  from  uniform.  Moreover,  there  are  efficient  algorithms  Invert 
and  SampleD  that  with  overwhelming  probability  over  all  random  choices,  do  the  following: 

•  For  b*  =  s4A  +  e where  s£Z"  is  arbitrary  and  either  ||e||  <  q/0(y/nlogq)  or  e  G-  D 

otq  for 

1/a  >  y/n\ogq  ■  cc(\/logn),  the  deterministic  algorithm  lnvert(R,  A,  b)  outputs  s  and  e. 

•  For  any  u  G  Z™  and  large  enough  s  =  Of  f  n  log  q),  the  randomized  algorithm  SampleD(R,  A,  u,  s) 
samples  from  a  distribution  within  negl(n)  statistical  distance  of  D^_ s-u(fldgn)- 

Throughout  this  section,  we  let  G  G  Z™xw  denote  some  fixed  primitive  matrix  that  admits  efficient 
inversion  and  preimage  sampling  algorithms,  as  described  in  Theorem  |4.1|  (Recall  that  typically,  w  = 
n [~ log  q\  for  some  appropriate  base  of  the  logarithm.)  All  our  algorithms  and  efficiency  improvements  are 
based  on  the  primitive  matrix  G  and  associated  algorithms  described  in  Section  [4j  and  a  new  notion  of 
trapdoor  that  we  define  next. 


5.1  A  New  Trapdoor  Notion 

We  begin  by  defining  the  new  notion  of  trapdoor,  establish  some  of  its  most  important  properties,  and  give  a 
simple  and  efficient  algorithm  for  generating  hard  random  lattices  together  with  high-quality  trapdoors. 

Definition  5.2.  Let  A  G  Z”xm  and  G  G  Z”X1JJ  be  matrices  with  m  >  w  >  n.  A  G-trapdoor  for  A  is  a 
matrix  R  G  y/m~w)xw  Such  that  A  [  =  HG  for  some  invertible  matrix  H  G  Z”Xn.  We  refer  to  H  as  the 

tag  or  label  of  the  trapdoor.  The  quality  of  the  trapdoor  is  measured  by  its  largest  singular  value  s\  (R  ). 

We  remark  that,  by  definition  of  G-trapdoor,  if  G  is  a  primitive  matrix  and  A  admits  a  G  trapdoor,  then 
A  is  primitive  as  well.  In  particular,  det(AJ-(A))  =  qn.  Since  the  primitive  matrix  G  is  typically  fixed  and 
public,  we  usually  omit  references  to  it,  and  refer  to  G-trapdoors  simply  as  trapdoors.  We  remark  that  since 
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G  is  primitive,  the  tag  H  in  the  above  definition  is  uniquely  determined  by  (and  efficiently  computable  from) 
A  and  the  trapdoor  R. 

The  following  lemma  says  that  a  good  basis  for  A-1  (A)  may  be  obtained  from  knowledge  of  R.  We 
do  not  use  the  lemma  anywhere  in  the  rest  of  the  paper,  but  include  it  here  primarily  to  show  that  our  new 
definition  of  trapdoor  is  at  least  as  powerful  as  the  traditional  one  of  a  short  basis.  Our  algorithms  for  Gaussian 
sampling  and  LWE  inversion  do  not  need  a  full  basis,  and  make  direct  (and  more  efficient)  use  of  our  new 
notion  of  trapdoor. 

Lemma  5.3.  Let  S  6  Zwxw  be  any  basis  for  A±(G).  Let  A  £  Z"xm  have  trapdoor  R  £  jfm-w)xw  wfth 
tag  H  £  Z"xn.  Then  the  lattice  A_L(A)  is  generated  by  the  basis 


I  R 

I  O' 

0  I 

W  S 

where  W  £  rZ,wxm  is  an  arbitrary  solution  to  GW  =  — H  1 A  [I  |  0]r  mod  q.  Moreover,  the  basis  Sa 
satisfies  ||Sa||  <  si(  [J  )  •  ||S||  <  (si(R)  +  1)  •  ||S||,  when  Sa  is  orthogonalized  in  suitable  order. 

Proof.  It  is  immediate  to  check  that  A  ■  Sa  =  0  mod  q,  so  Sa  generates  a  sublattice  of  A^(A).  In  fact,  it 
generates  the  entir 
The  bound  on 

when  the  columns  of  B  =  [^  §]  are  reordered  appropriately.  So  it  suffices  to  show  that  ||TB||  < 
si(T)  •  || B ||  for  any  T,  B.  Let  B  =  QDU  and  TB  =  Q'D'U'  be  Gram-Schmidt  decompositions  of  B 
and  TB,  respectively,  with  Q,  Q'  orthogonal,  D,  D'  diagonal  with  nonnegative  entries,  and  U,  U'  upper 
unitriangular.  We  have 

TQDU  =  Q'D'U'  =►  T'D  =  D'U", 

where  T  =  Q'T'Q”1  =>■  si(T')  =  si(T),  and  U"  is  upper  unitriangular  because  such  matrices  form  a 
multiplicative  group.  Now  every  row  of  T'D  has  Euclidean  norm  at  most  si(T)  •  ||D||  =  si(T)  •  ||B||, 
while  the  ith  row  of  D'U"  has  norm  at  least  d!i  t,  the  /th  diagonal  of  D'.  We  conclude  that  | j TB | j  =  ||D||  < 
si(T)  •  || B || ,  as  desired.  □ 


j  lattice  because  det(SA)  =  det(S)  =  qn  =  det(A±(A]). 

||  Sa  ||  follows  by  simple  linear  algebra.  Recall  by  Item  hi  of  Lemma 


2.1 


that  1 1 B 1 1  =  ||S| 


We  also  make  the  following  simple  but  useful  observations: 


The  rows  of  [  ^  ]  in  Definition 
columns. 


5.2 


can  appear  in  any  order,  since  this  just  induces  a  permutation  of  A’s 


•  If  R  is  a  trapdoor  for  A,  then  it  can  be  made  into  an  equally  good  trapdoor  for  any  extension  [A  |  B], 
by  padding  R  with  zero  rows;  this  leaves  si(R)  unchanged. 

•  If  R  is  a  trapdoor  for  A  with  tag  H,  then  R  is  also  a  trapdoor  for  A'  =  A  —  [0  |  H'G]  with  tag 
(H  —  H')  for  any  H'  £  Z”xn,  as  long  as  (H  —  H')  is  invertible  modulo  q.  This  is  the  main  idea 
behind  the  compact  IBE  of  t  ABB  1  Oak  and  can  be  used  to  give  a  family  of  “tag-based”  trapdoor 
functions  HKMO101.  In  Section  [6]  we  give  explicit  families  of  matrices  H  having  suitable  properties 
for  applications. 


5.2  Trapdoor  Generation 

We  now  give  an  algorithm  to  generate  a  (pseudo )random  matrix  A  together  with  a  G-trapdoor.  The  algorithm 
is  straightforward,  and  in  fact  it  can  be  easily  derived  from  the  definition  of  G-trapdoor  itself.  A  random 
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lattice  is  built  by  first  extending  the  primitive  matrix  G  into  a  semi-random  matrix  A'  =  [A  |  HG] 

is  the  desired  tag),  and  then  applying  a  random 


(where  A  E  Z”xm  is  chosen  at  random,  and  H  E  ZqXn 


transformation  T  =  [  J  E  Zmxm 
T_1  =  [*  ~ir-],  by  Lemma 
parity-check  matrix  A  =  A' 


2.1 


to  the  semi-random  lattice  AJ-(A/).  Since  T  is  unimodular  with  inverse 
this  yields  the  lattice  T  •  A-1  (A')  =  A-1  (A'  •  T  1 )  associated  with  the 
=  [A  |  HG  —  AR].  Moreover,  the  distribution  of  A  is  close  to  uniform 


T  =  l 

(either  statistically,  or  computationally)  as  long  as  the  distribution  of  [A  |  0]  T  1  =  [A  |  —  AR]  is.  For 
details,  see  Algorithm [l]  whose  correctness  is  immediate. 


Algorithm  1  Efficient  algorithm  GenTrapp(  A.  H)  for  generating  a  parity-check  matrix  A  with  trapdoor  R. 
Input:  Matrix  A  E  Z/'xm  for  some  fh>  1,  invertible  matrix  H  E  Z”xn,  and  distribution  V  over  Zrnx  w. 
(If  no  particular  A,  H  are  given  as  input,  then  the  algorithm  may  choose  them  itself,  e.g.,  picking 
A  E  Zg  xm  uniformly  at  random,  and  setting  H  =  I.) 

Output:  A  parity-check  matrix  A  =  [A  |  Ai]  E  Z”xm,  where  m  =  fh  +  w,  and  trapdoor  R  with  tag  H. 

1:  Choose  a  matrix  R  E  Zmxw  from  distribution  V. 

2:  Output  A  =  [A  |  HG  -  AR]  E  Z"xm  and  trapdoor  R  E  Z,fixw. 


We  next  describe  two  types  of  GenTrap  instantiations.  The  first  type  generates  a  trapdoor  R  for  a 
statistically  near-uniform  output  matrix  A  using  dimension  m  ~  n  log  q  or  less  (there  is  a  trade-off  between 
fh  and  the  trapdoor  quality  si(R)).  The  second  types  generates  a  computationally  pseudorandom  A  (under 
the  LWE  assumption)  using  dimension  fh  =  2 n;  this  pseudorandom  construction  is  the  first  of  its  kind  in  the 
literature.  Certain  applications  allow  for  an  optimization  that  decreases  fh  by  an  additive  n  term;  this  is  most 
significant  in  the  computationally  secure  construction  because  it  yields  fh  =  n. 


Statistical  instantiation.  This  instantiation  works  for  any  parameter  m  and  distribution  V  over  ZmXw 
having  the  following  two  properties: 


1.  Subgaussianity :  V  is  subgaussian  with  some  parameter  s  >  0  (or  d-subgaussian  for  some  small  6). 

that  R  E-  V  has  si(R)  =  s  ■  ()(\/rJi  +  yjw),  except  with  probability 


2.9 


This  implies  by  Lemma 
2~f (Recau  that  the  constant  factor  hidden  in  the  ()(■)  expression  is  ~  1  /\/2tt.) 


2.  Regularity,  for  A  •<—  Z”,xm  and  R  <—  V.  A  =  [A  |  AR]  is  ^-uniform  for  some  5  =  negl(n). 

In  fact,  there  is  no  loss  in  security  if  A  contains  an  identity  matrix  I  as  a  submatrix  and  is  otherwise 
uniform,  since  this  corresponds  with  the  Hermite  normal  form  of  the  SIS  and  LWE  problems.  See, 
e.g.,  IMR091  Section  5]  for  further  details. 


For  example,  let  V  =  ,pmxw  where  V  is  the  distribution  over  Z  that  outputs  0  with  probability  1/2,  and  ±1 
each  with  probability  1/4.  Then  V  (and  hence  V)  is  O-subgaussian  with  parameter  and  satisfies  the 
regularity  condition  (for  any  q)  for  d  <  ^  qn/2m.  by  a  version  of  the  leftover  hash  lemma  (see,  e.g.,  [  AP09i 
Section  2.2.1]).  Therefore,  we  can  use  any  m>n  lgq  +  21g^. 

As  another  important  example,  let  V  =  D/'/ 1"  be  a  discrete  Gaussian  distribution  for  some  s  >  Ve{%) 
and  e  =  negl(n).  Then  V  is  O-subgaussian  with  parameter  s  by  Lemma  2.8  and  satisfies  the  regularity 
condition  when  fh  satisfies  the  bound  i 


from  Lemma  2.4  For  example,  letting  s  =  2//f  (Z)  we  can  use 
any  fh  =  nlgq  +  a,’ (log  n).  (Other  tradeoffs  between  s  and  fh  are  possible,  potentially  using  a  different 
choice  of  G,  and  more  exact  bounds  on  the  error  probabilities  can  be  worked  out  from  the  lemma  statements.) 
Moreover,  by  Lemmas  |2.4| and  2.8  we  have  that  with  overwhelming  probability  over  the  choice  of  A,  the 


24 


Approved  for  Public  Release;  Distribution  Unlimited. 
95 


conditional  distribution  of  R  given  A  =  [A  |  AR]  is  negl(n)-subgaussian  with  parameter  s.  We  will  use 
this  fact  in  some  of  our  applications  in  Section [6j 


Computational  instantiation.  Let  A  =  [I  |  A]  £  ZgXm  for  m  =  2 n,  and  let  V  =  for  some 

s  =  aq,  where  a  >  0  is  an  LWE  relative  error  rate  (and  typically  aq  >  fn).  Clearly,  V  is  O-subgaussian 
with  parameter  aq.  Also,  [A  |  AR  =  AR2  +  Ri]  for  R  =  ^  «—  V  is  exactly  an  instance  of  decision- 
LWEn,g)Q  (in  its  normal  form),  and  hence  is  pseudorandom  (ignoring  the  identity  submatrix)  assuming  that 
the  problem  is  hard. 


Further  optimizations.  If  an  application  only  uses  a  single  tag  H  =  I  (as  is  the  case  with,  for  example, 
GPV  signatures  IIGPV08I).  then  we  can  save  an  additive  n  term  in  the  dimension  m  (and  hence  in  the  total 
dimension  m ):  instead  of  putting  an  identity  submatrix  in  A,  we  can  instead  use  the  identity  submatrix  from 
G  (which  exists  without  loss  of  generality,  since  G  is  primitive)  and  conceal  the  remainder  of  G  using  either 
of  the  above  methods. 

All  of  the  above  ideas  also  translate  immediately  to  the  ring  setting  (see  Section  |4~3j).  using  an  appropriate 
regularity  lemma  (e.g.,  the  one  in  IlLPR  lOi)  for  a  statistical  instantiation,  and  the  ring-LWE  problem  for  a 
computationally  secure  instantiation. 

5.3  LWE  Inversion 

Algorithm  [2]  below  shows  how  to  use  a  trapdoor  to  solve  LWE  relative  to  A.  Given  a  trapdoor  R  for 
A  £  Z£xm  and  an  LWE  instance  If  =  s' A  +  e;  mod  q  for  some  short  error  vector  e  £  Zm,  the  algorithm 
recovers  s  (and  e).  This  naturally  yields  an  inversion  algorithm  for  the  injective  trapdoor  function  $a(s,  e)  = 
s*  A  +  c1  mod  q,  which  is  hard  to  invert  (and  whose  output  is  pseudorandom)  if  LWE  is  hard. 


Algorithm  2  Efficient  algorithm  lnvertc)  (R.  A,  b)  for  inverting  the  function  //a(s.  e). 

Input:  An  oracle  O  for  inverting  the  function  go(s,  e)  when  e  £  Tiw  is  suitably  small. 

•  parity-check  matrix  A  £  Z”Xm; 

•  G-trapdoor  R  £  Zrnxkn  for  A  with  invertible  tag  H  £  Z”xri; 

•  vector  b*  =  pa(s,  e)  =  s*  A  +  e4  for  any  s  £  Z”  and  suitably  small  e  £  Zm. 
Output:  The  vectors  s  and  e. 

1:  Compute  tf  =  b*  [^]. 

2:  Get  (s,  e)  0(h). 

3:  return  s  =  H  (s  and  e  =  b  —  A*s  (interpreted  as  a  vector  in  Zm  with  entries  in  [—  §)). 


Theorem  5.4.  Suppose  that  oracle  O  in  Algorithmic orrectly  inverts  (Jg(s.  e)  for  any  error  vector  e  £ 
^1/2  (Q  '  B  f  )  for  some  B.  Then  for  any  s  and  e  of  length  ||e||  <  g/(2||B||s)  where  s  =  ^/si(R)2  +  1, 
Algorithm  [2]  correctly  inverts  Qa(s-  c)-  Moreover,  for  any  s  and  random  e  •(—  Dj,m  aq  where  1  /a.  > 
2 1 1 B  ||  s  •  a.’(\/log  n),  the  algorithm  inverts  successfully  with  overwhelming  probability  over  the  choice  ofe. 

Note  that  using  our  constructions  from  Section [4j  we  can  implement  O  so  that  either  ||B||  =2  (for  q  a 
power  of  2,  where  B  =  S  =  21)  or  ||B||  =  s/5  (for  arbitrary  q). 
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Proof.  Let  R  =  [r‘  i],  and  note  that  s  =  si(R).  By  the  above  description,  the  algorithm  works  correctly 
when  Re  G  V\/2 {q  ■  B_t);  equivalently,  when  (b*R)e/g  G  [—3,2)  f°r  aH  *•  By  definition  of  s,  we  have 
||b*R||  <  s||B||.  If  1 1 e 1 1  <  g/(2||B||s),  then  |(b£R)e/g|  <  1/2  by  Cauchy-Schwarz.  Moreover,  if  e  is 
chosen  at  random  from  Djrn^aq.  then  by  the  fact  that  e  is  O-subgaussian  (Lcmma|2.8[)  with  parameter  aq,  the 
probability  that  |(b|R)e/g|  >  1/2  is  negligible,  and  the  second  claim  follows  by  the  union  bound.  □ 


5.4  Gaussian  Sampling 

Here  we  show  how  to  use  a  trapdoor  for  efficient  Gaussian  preimage  sampling  for  the  function  /a,  i.e., 
sampling  from  a  discrete  Gaussian  over  a  desired  coset  of  A"1  (A).  Our  precise  goal  is,  given  a  G-trapdoor  R 
(with  tag  H)  for  matrix  A  and  a  syndrome  u  G  Z/,  to  sample  from  the  spherical  discrete  Gaussian  DA  1  ,  A),.s 
for  relatively  small  parameter  s.  As  we  show  next,  this  task  can  be  reduced,  via  some  efficient  pre-  and 
post-processing,  to  sampling  from  any  sufficiently  narrow  (not  necessarily  spherical)  Gaussian  over  the 
primitive  lattice  AJ-(G). 

The  main  ideas  behind  our  algorithm,  which  is  described  formally  in  Algorithm]!]  are  as  follows.  For 
simplicity,  suppose  that  R  has  tag  H  =  I,  so  A  [  ^  ]  =  G,  and  suppose  we  have  a  subroutine  for  Gaussian 
sampling  from  any  desired  coset  of  A~L(G)  with  some  small,  fixed  parameter  /Eg  >  //f  (A  (G)).  For 
example,  Section  |4|  describes  algorithms  for  which  /Sg  is  either  2  or  y/5.  (Throughout  this  summary  we 
omit  the  small  rounding  factor  r  =  ui(y/ log  n)  from  all  Gaussian  parameters.)  The  algorithm  for  sampling 
from  a  coset  A/ (A)  follows  from  two  main  observations: 

1.  If  we  sample  a  Gaussian  z  with  parameter  from  Au  (G)  and  produce  y  =  [ljf]z,  then  y  is 

Gaussian  over  the  (non-full-rank)  set  [  ^  ]  Ay  (G)  C  Au  (A)  with  parameter  [  ^  ]  /Eg  (i.e.,  covariance 
[^]  Sg[r‘  i]).  The  (strict)  inclusion  holds  because  for  any  y  =  [  z  where  z  G  A„  (G),  we  have 

Ay  =  (A[*]  )z  =  Gz  =  u. 

Note  that  si([^]  •  v/^g)  <  st([^])  •  si(v/Bg)  <  y/si(R)2  +  1  •  si(v/^g),  so  y’s  distribution  is 
only  about  an  si(R)  factor  wider  than  that  of  z  over  A/ (G).  However,  y  lies  in  a  non-full-rank  subset 
of  Ay  (A),  and  its  distribution  is  ‘skewed’  (non-spherical).  This  leaks  information  about  the  trapdoor 
R,  so  we  cannot  just  output  y. 

2.  To  sample  from  a  spherical  Gaussian  over  all  of  Ay  (A),  we  use  the  ‘convolution’  technique  from  BPeilOl 
to  correct  for  the  above-described  problems  with  the  distribution  of  y.  Specifically,  we  first  choose  a 
Gaussian  perturbation  p  G  Zm  having  covariance  s2  —  [^]  Sq  [r‘  i],  which  is  well-defined  as  long 
as  .S'  >  .S' ]  ( [ r/  •  v/^g).  We  then  sample  y  =  [^]zas  above  for  an  adjusted  syndrome  v  =  u  —  Ap, 
and  output  x  =  p  +  y.  Now  the  support  of  x  is  all  of  Ay  (A),  and  because  the  covariances  of  p  and  y 
are  additive  (subject  to  some  mild  hypotheses),  the  overall  distribution  of  x  is  spherical  with  Gaussian 
parameter  s  that  can  be  as  small  as  s  ~  sp  (R) 


Quality  analysis.  Algorithm [3] can  sample  from  a  discrete  Gaussian  with  parameter  s  ■  uj(-\/\ogn)  where 
s  can  be  as  small  as  \J s\  (R.)2  +  1  •  \A'i  (Ng)  +  2.  We  stress  that  this  is  only  very  s 
factor  of  at  most  ^6/4  <  1.23  —  than  the  bound  (si(R)  +  1)  •  ||S||  from  Lemma 


ightly  larger  —  a 
5.3  on  the  largest 


Gram-Schmidt  norm  of  a  lattice  basis  derived  from  the  trapdoor  R.  (Recall  that  our  constructions  from 
Section [4] give  si(Sq)  =  1 1 S 1 1 2  =  4  or  5.)  In  the  iterative  “randomized  nearest-plane”  sampling  algorithm 
of  [Kle0Q[[GPV081.  the  Gaussian  parameter  s  is  lower-bounded  by  the  largest  Gram-Schmidt  norm  of  the 
orthogonalized  input  basis  (times  the  same  Lo{\J\ogn)  factor  used  in  our  algorithm).  Therefore,  the  efficiency 
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and  parallelism  of  Algorithm  [3]  comes  at  almost  no  cost  in  quality  versus  slower,  iterative  algorithms  that  use 
high-precision  arithmetic.  (It  seems  very  likely  that  the  corresponding  small  loss  in  security  can  easily  be 
mitigated  with  slightly  larger  parameters,  while  still  yielding  a  significant  net  gain  in  performance.) 

Runtime  analysis.  We  now  analyze  the  computational  cost  of  Algorithm  [3]  with  a  focus  on  optimizing  the 
online  runtime  and  parallelism  (sometimes  at  the  expense  of  the  offline  phase,  which  we  do  not  attempt  to 
optimize). 

The  offline  phase  is  dominated  by  sampling  from  Dzm  for  some  fixed  (typically  non-spherical) 
covariance  matrix  £  >  I.  By  llPeilOt  Theorem  3.1],  this  can  be  accomplished  (up  to  any  desired  statistical 
distance)  simply  by  sampling  a  continuous  Gaussian  T  with  sufficient  precision,  then  independently 

randomized-rounding  each  entry  of  the  sampled  vector  to  Z  using  Gaussian  parameter  r  >  r?e(Z). 

Naively,  the  online  work  is  dominated  by  the  computation  of  H  1  (u  —  w)  and  Rz  (plus  the  call  to 
O(v),  which  as  described  in  Section  [4] requires  only  0(logcn)  work,  or  one  table  lookup,  by  each  of  n 
processors  in  parallel).  In  general,  the  first  computation  takes  O (n2 )  scalar  multiplications  and  additions 
in  Zg,  while  the  latter  takes  0(m  ■  w),  which  is  typically  0(n2  log2  q).  (Obviously,  both  computations  are 
perfectly  parallelizable.)  However,  the  special  form  of  z,  and  often  of  H,  allow  for  some  further  asymptotic 
and  practical  optimizations:  since  z  is  typically  produced  by  concatenating  n  independent  dimension- A: 
subvectors  that  are  sampled  offline,  we  can  precompute  much  of  Rz  by  pre-multiplying  each  subvector  by 
each  of  the  n  blocks  of  k  columns  in  R.  This  reduces  the  online  computation  of  Rz  to  the  summation  of  n 
dimension-??}  vectors,  or  0(n2  log  q)  scalar  additions  (and  no  multiplications)  in  Zq.  As  for  multiplication  by 
H1,  in  some  applications  (like  GPV  signatures)  H  is  always  the  identity  I,  in  which  case  multiplication  is 
unnecessary;  in  all  other  applications  we  know  of,  H  actually  represents  multiplication  in  a  certain  extension 
field/ring  of  Zq,  which  can  be  computed  in  0{n  log  n)  scalar  operations  and  depth  0(log  n).  In  conclusion, 
the  asymptotic  cost  of  the  online  phase  is  still  dominated  by  computing  Rz,  which  takes  0(n2)  work,  but  the 
hidden  constants  are  small  and  many  practical  speedups  are  possible. 

Theorem  5.5.  Algorithm's  correct. 

To  prove  the  theorem  we  need  the  following  fact  about  products  of  Gaussian  functions. 

Fact  5.6  (Product  of  degenerate  Gaussians).  Let  £i,  £2  £  Mrnxm  be  symmetric  positive  semidefinite  matrices, 
let  Vi  =  span(£j )  for  i  =  1,2  and  V3  =  V\  n  V2»  let  P  =  P#  £  Mmx  m  be  the  symmetric  matrix  that  projects 
orthogonally  onto  V3,  and  let  ci,  C2  £  Mm  be  arbitrary.  Supposing  it  exists,  let  v  be  the  unique  point  in 
(Vi  +  Cl)  n  (V2  +  c2)  n  V3-.  Then 

Pfr, f(x  -  ci)  •  PfrsM  ~  C2)  =  fVsl+s^(c  1  -  ca)  '  /Vsl(x  “  C3)’ 

where  £3  and  C3  £  v  +  V3  are  such  that 

£+  =  P(£+  +  £+)P 
S3  (C3  -  v)  =  £+  (d  -  v)  +  £^ (C2  -  v). 

Proof  of  Theorem  [53j  We  adopt  the  notation  from  the  algorithm,  let  V  =  span(  [  ^  )  C  Mm,  let  P  be  the 
matrix  that  projects  orthogonally  onto  V,  and  define  the  lattice  A  =  Zm  nV  =  £([?■])>  which  spans  V. 
We  analyze  the  output  distribution  of  SampleD.  Clearly,  it  always  outputs  an  element  of  (A),  so  let  x  £ 
A„  (A)  be  arbitrary.  Now  SampleD  outputs  x  exactly  when  it  chooses  in  Step[T]some  p  £  V  +  x,  followed  in 
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Algorithm  3  Efficient  algorithm  SampleD°(R,  A,  H,  u,  s )  for  sampling  a  discrete  Gaussian  over  A„  (A). 

Input:  An  oracle  <D(v)  for  Gaussian  sampling  over  a  desired  coset  Av  (G)  with  fixed  parameter  rs/E g  > 
r/e(A±(G)),  for  some  Eg  >  2  and  e  <  1/2. 

Offline  phase: 

•  partial  parity-check  matrix  A  G  Z”xm; 

•  trapdoor  matrix  R  G 

•  positive  definite  E  >  [^]  (2  +  Eg)[r‘  i],  e.g.,  any  E  =  s2  >  (si(R)2  +  1)(si(Eg)  +  2). 
Online  phase: 

•  invertible  tag  H  G  Z™ xn  defining  A  =  [A  |  HG  —  AR]  g  Z™ xm,  for  m  =  fh  +  w 
(H  may  instead  be  provided  in  the  offline  phase,  if  it  is  known  then); 

•  syndrome  uGZJ. 

Output:  A  vector  x  drawn  from  a  distribution  within  0(e)  statistical  distance  of  I)^_  ^A)  r.^- 
Offline  phase: 

1:  Choose  a  fresh  perturbation  p  •<—  D^m  where  Ep  =  E  —  [^]  Eg  [r‘  i]  >2[?][r‘i]. 

2:  Let  p  =  [p2  ]  for  pi  G  Zm,  p2  G  Z™,  and  compute  w  =  A(pi  —  RP2)  G  Z”  and  w  =  Gp2  G  Z”. 
Online  phase: 

3:  Let  v  G-  H_1(u  —  w)  —  w  =  H^x(u  —  Ap)  G  Z™,  and  choose  z  «—  -D^-Ng)  by  calling  0(v). 
4:  return  x  p  +  [^]z. 


Step |3jby  the  unique  z  G  A^  (G)  such  that  x  —  p  =  [  ^  ]  z.  It  is  easy  to  check  that  (z) 
where 


x-p) 


sy  =  [  i-]  So[R‘  i]  >  2[^][r*  i] 

is  the  covariance  matrix  with  span(Ey)  =  V.  Note  that  Ep  +  Ey  =  E  by  definition  of  Ep,  and  that 
span(Ep)  =  Mm  because  Ep  >  0.  Therefore,  we  have  (where  C  denotes  a  normalizing  constant  that  may 
vary  from  line  to  line,  but  does  not  depend  on  x): 


Px  =  Pr[SampleD  outputs  x] 

^2  '  DA^(G),ry/^^ 

pszmn(y+x) 

=  C  J2  Pr^/Sp  (P)  '  Pr^(P  “  *)/■  Prv^(Av  (G)) 

P 

=  C  '  PrVsW  •  ^PrVS^P  “  C3)/^7S^(Av  (G)) 

P 

e  C[  1,  £f]  •  fVs(x)  ■  X]  PrV^(P  -  C3) 

P 


(def.  of  SampleD) 
(def.  of  D) 
(Lact[5T6]) 

(Lemma  [za]  and  r  V^>r?e(A±(G))) 


=  £f]  '  /Vs(*)  •  Pr^m  n  (V  +  x)  -  c3),  (5.1) 

where  Eg"  =  P(E+  +  E/  )P  and  c.3  G  v  +  f  =  x  +  V,  because  the  component  of  x  orthogonal  to  V  is  the 
unique  point  v  G  (V  +  x)  n  V  .  Therefore, 


zm  n  (v  +  x)  -  c3  =  (zm  n  F)  +  (x  -  c3)  c  v 
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is  a  coset  of  the  lattice  A  =  £(  [  ^] ).  It  remains  to  show  that  ry/Jfi  >  rje( A),  so  that  the  rightmost  term 
in  (|5.1[)  above  is  essentially  a  constant  (up  to  some  factor  in  [-j— A  1])  independent  of  x,  by  Lemma 


2.5 


Then 


we  can  conclude  that  px  £  [yipf ,  yzjf]  ■  pry/^(5L),  from  which  the  theorem  follows. 

To  show  that  ry/Yfs  >  r/f  (A),  note  that  since  A*  C  V,  for  any  covariance  IT  we  have  pp  v/ir(A*j  = 
p^iy(A*),  and  so  P\/n  >  pe( A)  if  and  only  if  \/n  >  77, E(A).  Now  because  both  Sp,  Sy  >  2[^]  [r‘  i],  we 
have 


+  Sy  <([l][R‘l])' 


Because  r  [  ^  ]  >  pe( A)  for  e  =  negl(n)  by  Lemma 
desired. 


2.3 


we  have  ry/T^,  =  rJ  (Up  +  S 


>  ??e(A),  as 
□ 


5.5  Trapdoor  Delegation 

Here  we  describe  very  simple  and  efficient  mechanism  for  securely  delegating  a  trapdoor  for  A  £  Z”xm 
to  a  trapdoor  for  an  extension  A'  £  Z™  Xm/  of  A.  Our  method  has  several  advantages  over  the  previous 
basis  delegation  algorithm  of  flCHKPlOl :  first  and  most  importantly,  the  size  of  the  delegated  trapdoor  grows 
only  linearly  with  the  dimension  m!  of  A  (A'),  rather  than  quadratically.  Second,  the  algorithm  is  much 
more  efficient,  because  it  does  not  require  testing  linear  independence  of  Gaussian  samples,  nor  computing 
the  expensive  ToBasis  and  Hermite  normal  form  operations.  Third,  the  resulting  trapdoor  R  has  a  ‘nice’ 
Gaussian  distribution  that  is  easy  to  analyze  and  may  be  useful  in  applications.  We  do  note  that  while  the 
delegation  algorithm  from  HCHKP101  works  for  any  extension  A'  of  A  (including  A  itself),  ours  requires 
no!  >  m  +  w.  Fortunately,  this  is  frequently  the  case  in  applications  such  as  HIBE  and  others  that  use 
delegation. 


Algorithm  4  Efficient  algorithm  DelTrapc>(A/  =  [A  |  Ai],  H',  s')  for  delegating  a  trapdoor. 

Input:  an  oracle  O  for  discrete  Gaussian  sampling  over  cosets  of  A  =  A~  (A)  with  parameter  s'  >  r}£( A). 

•  parity-check  matrix  A'  =  [A  |  Ai]  6  Z” xm  x  Z”X17’; 

•  invertible  matrix  H'  G  Z”xn; 

Output:  a  trapdoor  R'  £  Zmxi"  for  A'  with  tag  H  £  Z"xn. 

1:  Using  O,  sample  each  column  of  R'  independently  from  a  discrete  Gaussian  with  parameter  s'  over  the 
appropriate  coset  of  A±(A),  so  that  AR'  =  H'G  -  Ai_ 


Usually,  the  oracle  O  needed  by  Algorithm [4] would  be  implemented  (up  to  negl(n)  statistical  distance)  by 
Algorithm [3] above,  using  a  trapdoor  R  for  A  where  si(R)  is  sufficiently  small  relative  to  s'.  The  following 
is  immediate  from  Lem ma|2.9|and  the  fact  that  the  columns  of  R'  are  independent  and  negl(?i)-subgaussian. 
A  relatively  tight  bound  on  the  hidden  constant  factor  can  also  be  derived  from  Lemma [2)9] 

Lemma  5.7.  For  any  valid  inputs  A'  and  H',  Algo ri th m [?] o u tp u ts  a  trapdoor  R'  for  A'  with  tag  H',  whose 
distribution  is  the  same  for  any  valid  implementation  ofO,  and  si(R')  <  s'  ■  0  ( \Jm  +  y/w)  except  with 
negligible  probability. 
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6  Applications 


The  main  applications  of  “strong”  trapdoors  have  included  digital  signature  schemes  in  both  the  random- 
oracle  and  standard  models,  encryption  secure  under  chosen-ciphertext  attack  (CCA),  and  (hierarchical) 
identity-based  encryption.  Here  we  focus  on  signature  schemes  and  CCA-secure  encryption,  where  our 
techniques  lead  to  significant  new  improvements  (beyond  what  is  obtained  by  plugging  in  our  trapdoor 
generator  as  a  “black  box”).  Where  appropriate,  we  also  briefly  mention  the  improvements  that  are  possible 
in  the  remaining  applications. 

6.1  Algebraic  Background 

In  our  applications  we  need  a  special  collection  of  elements  from  a  certain  ring  1Z,  which  induce  invertible 
matrices  H  G  Zxxn  as  required  by  our  trapdoor  construction.  We  construct  such  a  ring  using  ideas  from  the 
literature  on  secret  sharing  over  groups  and  modules,  e.g..  llDF94liFeh981.  Define  the  ring  TZ  =  ZgM/(/0)) 
for  some  monic  degree-n  polynomial  f(x)  =  xn  +  /„_  i  xn~ 1  +  •  •  •  +  /o  £  Z  [x\  that  is  irreducible 
modulo  every  prime  p  dividing  q.  (Such  an  f(x  ')  can  be  constructed  by  finding  monic  irreducible  degree- 
n  polynomials  in  Zp  \x\  for  each  prime  p  dividing  q,  and  using  the  Chinese  remainder  theorem  on  their 
coefficients  to  get  f(x).)  Recall  that  TZ  is  a  free  Zg-module  of  rank  n,  i.e.,  the  elements  of  R,  can  be 
represented  as  vectors  in  Zg  relative  to  the  standard  basis  of  monomials  l,x,. . . ,  xn~l .  Multiplication  by 
any  fixed  element  of  TZ  then  acts  as  a  linear  transformation  on  Z™  according  to  the  rule  x  •  (ao, . . . ,  an_i  )/'  = 
(0,  ao,  •  •  • ,  an-2Y  —  an_i(/o,  fi, . . . ,  /„_i  )\  and  so  can  be  represented  by  an  (efficiently  computable)  matrix 
in  Z”xn  relative  to  the  standard  basis.  In  other  words,  there  is  an  injective  ring  homomorphism  h  \  R.  —t  Z"xn 
that  maps  any  a  £  R,  to  the  matrix  H  =  h(a)  representing  multiplication  by  a.  In  particular-,  H  is  invertible 
if  and  only  if  a  £  R* ,  the  set  of  units  in  TZ.  By  the  Chinese  remainder  theorem,  and  because  Z P[x\/(f(x)) 
is  a  field  by  construction  of  f(x),  an  element  a  £  71  is  a  unit  exactly  when  it  is  nonzero  (as  a  polynomial 
residue)  modulo  every  prime  p  dividing  q.  We  use  this  fact  quite  essentially  in  the  constructions  that  follow. 

6.2  Signature  Schemes 
6.2.1  Definitions 

A  signature  scheme  SIG  for  a  message  space  M  (which  may  depend  on  the  security  parameter  n)  is  a  tuple 
of  PPT  algorithms  as  follows: 

•  Gen(ln)  outputs  a  verification  key  vk  and  a  signing  key  sk. 

•  Sign(s/c,  p),  given  a  signing  key  sk  and  a  message  p  6  M,  outputs  a  signature  a  G  {0, 1}*. 

•  Ver (ufc,  p.  a),  given  a  verification  key  vk,  a  message  p,  and  a  signature  a,  either  accepts  or  rejects. 

The  correctness  requirement  is:  for  any  //  £  M,  generate  (vk,  sk)  4—  Gen(T"  )  and  a  v-  Sign  (.A:,  p).  Then 
\fer(vk,  p.  a)  should  accept  with  overwhelming  probability  (over  all  the  randomness  in  the  experiment). 

We  recall  two  standard  notions  of  security  for  signatures.  An  intermediate  notion  is  strong  unforge¬ 
ability  under  static  chosen-message  attack,  or  su-scma  security,  is  defined  as  follows:  first,  the  forger  T 
outputs  a  list  of  distinct  query  messages  p(-]> , . . . ,  p^  for  some  Q.  (The  distinctness  condition  simplifies 
our  construction,  and  does  not  affect  the  notion’s  usefulness.)  Next,  we  generate  ( vk,sk )  Gen(ln) 

and  crW  Sign(s/c,  p^)  for  each  i  £  [Q],  then  give  vk  and  each  tjW  to  T.  Finally,  T  outputs  an  at¬ 
tempted  forgery  ( p*,a *).  The  forger’s  advantage  Adv|jQCma(Jr)  is  the  probability  that  Ver(ufc,  p* ,  a*) 
accepts  and  (p* ,  a*)  Y  a^'))  f°r  aH  <  £  \Q]-  taken  over  all  the  randomness  of  the  experiment.  The 
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scheme  is  su-scma-secure  if  AdvsjQCma(J7)  =  negl(n)  for  every  nonuniform  probabilistic  polynomial-time 
algorithm  T . 

Another  notion,  called  strong  existential  unforgeability  under  adaptive  chosen-message  attack,  or  su-acma 
security,  is  defined  similarly,  except  that  T  is  first  given  vk  and  may  adaptively  choose  the  messages  p^  to 
be  signed,  which  need  not  be  distinct. 

Using  a  family  of  chameleon  hash  functions,  there  is  a  generic  transformation  from  eu-scma-  to  eu-acma- 
security;  see,  e.g.,  IIKR00I1.  Furthermore,  the  transformation  results  in  an  offline/online  scheme  in  which  the 
Sign  algorithm  can  be  precomputed  before  the  message  to  be  signed  is  known;  see  IIST011.  The  basic  idea 
is  that  the  signer  chameleon  hashes  the  true  message,  then  signs  the  hash  value  using  the  eu-scma-secure 
scheme  (and  includes  the  randomness  used  in  the  chameleon  hash  with  the  final  signature).  A  suitable  type  of 
chameleon  hash  function  has  been  constructed  under  a  weak  hardness-of-SIS  assumption;  see  ICHKP101. 


6.2.2  Standard  Model  Scheme 


Here  we  give  a  signature  scheme  that  is  statically  secure  in  the  standard  model.  The  scheme  itself  is  essentially 
identical  (up  to  the  improved  and  generalized  parameters)  to  the  one  of  [Boy  10),  which  is  a  lattice  analogue  of 
the  pairing -based  signature  of  llWat05ll.  We  give  a  new  proof  with  an  improved  security  reduction  that  relies 
on  a  weaker  assumption.  The  proof  uses  a  valiant  of  the  “prefix  technique”  BHW091  also  used  in  fCHKPlOll. 

Our  scheme  involves  a  number  of  parameters.  For  simplicity,  we  give  some  exemplary  asymptotic  bounds 
here.  (Other  slight  trade-offs  among  the  parameters  are  possible,  and  more  precise  values  can  be  obtained 
using  the  more  exact  bounds  from  earlier  in  the  paper  and  the  material  below.)  In  what  follows,  u;(\/log  n) 
represents  a  fixed  function  that  asymptotically  grows  faster  than  yfiogn. 


i  Ge  Zqxnk  is  a  gadget  matrix  for  large  enough  q  =  poly(n)  and  k  =  [  log  q]  =  O(logn),  with  the 
ability  to  sample  from  cosets  of  A-L(G)  with  Gaussian  parameter  0(1)  •  w(\/log  n)  >  7)e(A± (G)). 
(See  for  example  the  constructions  from  Sectional]) 

•  m  =  0(nk )  and  V  =  vtgn)  so  (A,  AR)  is  negl(n)-far  from  uniform  for  A  -t—  Z”xm  and 

R  -c—  V,  and  m  =  m  +  ‘Ink  is  the  total  dimension  of  the  signatures. 

•  £  is  a  suitable  message  length  (see  below),  and  s  =  O(pjlnk)  ■  cc(  \/log  n)2  is  a  sufficiently  large 
Gaussian  parameter. 


The  legal  values  of  £  are  influenced  by  the  choice  of  q  and  n.  Our  security  proof  requires  a  special 
collection  of  units  in  the  ring  7 Z  =  Zq\x\  /  (f  (x))  as  constructed  in  Section  6.1  above.  We  need  a  sequence  of 
£  units  v  i , . . .  ,U£  G  1Z*,  not  necessarily  distinct,  such  that  any  nontrivial  subset-sum  is  also  a  unit,  i.e.,  for 
any  nonempty  S  C  [('] ,  YlieS  v">  e  ^* ■  By  the  characterization  of  units  in  1Z  described  in  Section  6.1  letting 
p  be  the  smallest  prime  dividing  q,  we  can  allow  any  £  <  (p  —  1)  •  n  by  taking  p  —  1  copies  of  each  of  the 
monomials  xl  F  1Z*  for  i  =  0, . . . ,  n  —  1. 

The  signature  scheme  has  message  space  {0, 1}^,  and  is  defined  as  follows. 


•  Gen(ln):  choose  A  Z”xm,  choose  R  6  from  distribution  V,  and  let  A  =  [A  |  G  —  AR]. 

For  i  =  0,1, ...  ,£,  choose  A*  <—  Z”xnfc.  Also  choose  a  syndrome  u  <—  Z™. 

The  public  verification  key  is  vk  =  (A,  Aq,  . . . ,  A^,  u).  The  secret  signing  key  is  sk  =  R. 


Sign(sA:,  p  G  {0, 1}^):  let  AM  = 


<G  Z£xm 


,  where  pi  G  {0, 1}  is  the  ith  bit 


A-o  +  Sie[q 

of  p,  interpreted  as  an  integer.  Output  v  G  Zm  sampled  from  79ax(aai),s>  using  SampleD  with  trapdoor 
R  for  A  (which  is  also  a  trapdoor  for  its  extension  AM). 
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•  \/er(vk,  p,  v):  let  A/t  be  as  above.  Accept  if  ||  v||  <  s  ■  yjm  and  AM  ■  v  =  u;  otherwise,  reject. 

Notice  that  the  signing  process  takes  0(dn2k)  scalar  operations  (to  add  up  the  A,s),  but  after  transforming 
the  scheme  to  a  fully  secure  one  using  chameleon  hashing,  these  computations  can  be  performed  offline 
before  the  message  is  known. 

Theorem  6.1.  There  exists  a  PPT  oracle  algorithm  (a  reduction )  S  attacking  the  S I S  r/ .  problem  for  large 
enough  /3  =  0(£(nk)3^2)  ■  oj(\/\o"  n)3  such  that,  for  any  adversary  T  mounting  an  su-scma  attack  on  SIG 
and  making  at  most  Q  queries, 

AdvSis q^SF)  >  Adv”(^)/(2(£  -  1  )Q  +  2)  -  negl(n). 

Proof  Let  T  be  an  adversary  mounting  an  su-scma  attack  on  SIG,  having  advantage  <5  =  Adv^L|'^cma(Jr). 
We  construct  a  reduction  S  attacking  SIS,^.  The  reduction  S  takes  as  input  fh  +  nk  +  1  uniformly  random 
and  independent  samples  from  Z”,  parsing  them  as  a  matrix  A  =  [A  |  B]  €  ancj  syndrome 

u7  £  Z™.  It  will  use  T  either  to  find  some  z  £  Zm  of  length  ||z||  <  (3  —  1  such  that  Az  =  u7  (from  which  it 
follows  that  [A  |  u7]  •  z7  =  0,  where  z7  =  [  Z]  ]  is  nonzero  and  of  length  at  most  3),  or  a  nonzero  z  £  Zm 
such  that  Az  =  0  (from  which  is  follows  that  [A  |  u7]  •  [  q  ]  =  0). 

We  distinguish  between  two  types  of  forger  T\  one  that  produces  a  forgery  on  an  unqueried  message 
(a  violation  of  standard  existential  unforgeability),  and  one  that  produces  a  new  signature  on  a  queried 
message  (a  violation  of  strong  unforgeability).  Clearly  any  T  with  advantage  6  has  probability  at  least  <5/2  of 
succeeding  in  at  least  one  of  these  two  tasks. 

First  we  consider  T  that  forges  on  an  unqueried  message  (with  probability  at  least  5/2).  Our  reduction  S 
simulates  the  static  chosen-message  attack  to  T  as  follows: 

•  Invoke  T  to  receive  up  to  Q  messages  p/l\p^2\  ...  £  {0, 1}^.  Compute  the  set  P  of  all  strings 
p  £  {0, 1 } - '  having  the  property  that  p  is  a  shortest  string  for  which  no  pt])  has  p  as  a  prefix. 
Equivalently,  P  represents  the  set  of  maximal  subtrees  of  {0, 1}-^  (viewed  as  a  tree)  that  do  not 
contain  any  of  the  queried  messages.  The  set  P  has  size  at  most  (l  —  1)  •  Q  +  1,  and  may  be  computed 
efficiently.  (See,  e.g.,  [CHKPIOI  for  a  precise  description  of  an  algorithm.)  Choose  some  p  from  P 
uniformly  at  random,  letting  t  =  \p\  <  £. 

•  Construct  a  verification  key  vk  =  (A,  Aq,  . . . ,  A^,  u  =  u7):  for  i  =  0, . . . ,  l,  choose  Rj  <—  V ,  and  let 


A i  =  H,  G  —  ARj,  where  H, 


h{ 0)  =  0  i  >  t 

(-l)Pi  •  h(ui)  i  £  [t] 

,-E  i  =  0 


(Recall  that  u\, . . . ,  ui  £  1Z  =  Zq[x]/ ( f(x ))  are  units  whose  nontrivial  subset-sums  are  also  units.) 

Note  that  by  hypothesis  on  fh  and  V,  for  any  choice  of  p  the  key  vk  is  only  negl(n)-far  from  uniform 
in  statistical  distance.  Note  also  that  by  our  choice  of  the  H,;,  for  any  message  p  £  {0,  l}7  having  p 
as  a  prefix,  we  have  Ho  +  £*e  m  =  0.  Whereas  for  any  p  £  {0,  1 } '  having  p'  f  p  as  its  /-bit 

prefix,  we  have 


H0  +  ^/r;:H,  =  ^(p7-K)-H,  =  £  (-l)p<  ■  H ,  =  h(  ^  ut), 

ie[q  i£[t]  i^[t],p[f=pi 


which  is  invertible  by  hypothesis  on  the  rqs.  Finally,  observe  that  with  overwhelming  probability 
over  any  fixed  choice  of  vk  and  the  H,,  each  column  of  each  Rj  is  still  independently  distributed  as 
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a  discrete  Gaussian  with  parameter  u(y/logn)  >  r/f  f  A)  over  some  fixed  coset  of  A  ■  (A),  for  some 
negligible  e  =  e(n). 

•  Generate  signatures  for  the  queried  messages:  for  each  message  p  =  p(l\  compute 

A =  [A  |  Ao  +  p-jAi]  =  [A  |  B  J  HG  —  A(Ro  +  AfiRt)] , 
te[t]  is  [A] 

where  H  is  invertible  because  the  A-bit  prefix  of  p  is  not  p.  Therefore,  R  =  (Ro  +  YUe\i]  AhRj)  is 
a  trapdoor  for  A/(.  By  the  conditional  distribution  on  the  R,s,  concatenation  of  subgaussian  random 
variables,  and  Lemma \L9\  we  have 

si(R)  =  \Jt  +  1  •  0(\/m  +  v/nJc)  •  u;(-\/logn)  =  0(VJnk)  ■  oj(y/logn) 

with  overwhelming  probability.  Since  s  =  O(VJnk)  -uj/s/log  n)2  is  sufficiently  large,  we  can  generate 
a  properly  distributed  signature  v/(.  -C,a-l(a.ai)  s  using  SampleD  with  trapdoor  R. 

Next,  5  gives  vk  and  the  generated  signatures  to  T .  Because  vk  and  the  signatures  are  distributed  within 
negl(n)  statistical  distance  of  those  in  the  real  attack  (for  any  choice  of  the  prefix  p),  with  probability  at  least 
5/2  —  negl(n),  T  outputs  a  forgery  (//*.  v*)  where  p*  is  different  from  all  the  queried  messages,  A/(*  v*  =  u, 
and  ||  v*  ||  <  s  ■  \Jrn.  Furthermore,  conditioned  on  this  event,  p*  has  p  as  a  prefix  with  probability  at  least 
1  /((£  —  1  )Q  +  1)  —  negl(n),  because  p  is  still  essentially  uniform  in  P  conditioned  on  the  view  of  T. 
Therefore,  all  of  these  events  occur  with  probability  at  least  5/(2{l  —  1  )Q  +  2)  —  negl(n). 

In  such  a  case,  S  extracts  a  solution  to  its  SIS  challenge  instance  from  the  forgery  (p,*,  v*)  as  follows. 
Because  p*  starts  withp,  we  have  A^*  =  [A  |  B  |  —  AR*]  for  R*  =  Rq  +  /r*Rj,  and  so 


u  mod  q, 


as  desired.  Because  || v* ||  <  s  ■  y/m  =  0(\/Ink )  •  co(^logn)2  and  si(R*)  =  \J t  +  1  •  0(y/m  +  y/nk)  ■ 
u(y/log  n )  with  overwhelming  probability  (conditioned  on  the  view  of  T  and  any  fixed  H  ),  we  have 
|| z; ||  =  0(£(nk)3/2)  ■  u(y/Togn)3,  which  is  at  most  /3  —  1,  as  desired. 

Now  we  consider  an  T  that  forges  on  one  of  its  queried  messages  (with  probability  at  least  5/2).  Our 
reduction  S  simulates  the  attack  to  T  as  follows: 

•  Invoke  T  to  receive  up  to  Q  distinct  messages  p/2\  . . .  e  {0, 1}^.  Choose  one  of  these  messages 
p  =  p,W  uniformly  at  random,  “guessing”  that  the  eventual  forgery  will  be  on  p. 

•  Construct  a  verification  key  vk  =  (A,  Ao, . . . ,  Ag,  u):  generate  A,  exactly  as  above,  using  p  =  p. 
Then  choose  v  a—  Dzm,s  and  let  u  =  A^v,  where  AM  is  defined  in  the  usual  way. 

•  Generate  signatures  for  the  queried  messages:  for  all  the  queries  except  p,  proceed  exactly  as  above 
(which  is  possible  because  all  the  queries  are  distinct  and  hence  do  not  have  p  =  p  as  a  prefix).  For  p, 
use  v  as  the  signature,  which  has  the  required  distribution  DA_ l(am),s  by  construction. 

When  S  gives  vk  and  the  signatures  to  P,  with  probability  at  least  5/2  —  negl(n)  the  forger  must  output  a 
forgery  (p*,  v*)  where  p*  is  one  of  its  queries,  v*  is  different  from  the  corresponding  signature  it  received, 
A Aj*v*  =  u,  and  ||v*||  <  s  ■  yfm.  Because  vk  and  the  signatures  are  appropriately  distributed  for  any 
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choice  n  that  S  made,  conditioned  on  the  above  event  the  probability  that  //*  =  //  is  at  least  l/Q  —  negl(n). 
Therefore,  all  of  these  events  occur  with  probability  at  least  5/(2Q)  —  negl(n). 

In  such  a  case,  S  extracts  a  solution  to  its  SIS  challenge  from  the  forgery  as  follows.  Because  p*  =  // ,  we 
B  |  -AR*]  for  R*  =  R0  +  Y^i&[e]  Ri»  and  so 


have  Am*  =  [A 


[A  I  Bl 


— R* 


*-nk 


(v*  —  v)  =  0  mod  q. 


Because  both  ||v*||,  ||v||  <  s  ■  yfm  =  O(VInk)  ■  uj(y/logn)2  and  si(R*)  =  O(VInk)  ■  u(yAogn)  with 
overwhelming  probability  (conditioned  on  the  view  of  T  and  any  fixed  H  ),  we  have  ||z||  =  0(i(nk )3/2)  • 
w(v^og  n)3  with  overwhelming  probability,  as  needed.  It  just  remains  to  show  that  z  /  0  with  overwhelming 
probability.  To  see  this,  write  w  =  v*  —  v  =  (wy,  W2,  W3)  G  Zm  x  Znfc  x  Znfc,  with  w  A  0.  If  W2  A  0  or 
W3  =  0,  then  z  /  0  and  we  are  done.  Otherwise,  choose  some  entry  of  W3  that  is  nonzero;  without  loss  of 
generality  say  it  is  wm.  Let  r  =  (Ro)nfc.  Now  for  any  fixed  values  of  R,  for  i  G  [£]  and  fixed  first  nk  —  1 
columns  of  Ro,  we  have  z  =  0  only  if  r  •  wrn  =  y  G  Mm  for  some  fixed  y.  Conditioned  on  the  adversary’s 
view  (specifically,  (Ao)nfc  =  Ar),  r  is  distributed  as  a  discrete  Gaussian  of  parameter  >  2pe(A-L(A))  for 
some  e  =  negl(n)  over  a  coset  of  A~L(A).  Then  by  Lemma  2.7  we  have  r  =  y/wm  with  only 
probability,  and  we  are  done.  □ 


6.3  Chosen  Ciphertext-Secure  Encryption 

Definitions.  A  public-key  cryptosystem  for  a  message  space  M.  (which  may  depend  on  the  security 
parameter)  is  a  tuple  of  algorithms  as  follows: 

•  Gen(ln)  outputs  a  public  encryption  key  pk  and  a  secret  decryption  key  sk. 

•  En c(pk,  m),  given  a  public  key  pk  and  a  message  m  G  M,  outputs  a  ciphertext  c  €{0,1}*. 

•  Dec(sfc,  c),  given  a  decryption  key  sk  and  a  ciphertext  c,  outputs  some  m  G  M.  U  { _L}. 

The  correctness  requirement  is:  for  any  m  G  M.,  generate  ( pk ,  sk)  -t—  Gen(L')  and  c  <—  En c(pk,  m ).  Then 
Decf.sA:,  c)  should  output  m  with  overwhelming  probability  (over  all  the  randomness  in  the  experiment). 

We  recall  the  two  notions  of  security  under  chosen-ciphertext  attacks.  We  start  with  the  weaker  notion 
of  CCA1  (or  “lunchtime”)  security.  Let  A  be  any  nonuniform  probabilistic  polynomial-time  algorithm. 
First,  we  generate  (pk,  sk)  <—  Gen(  l")  and  give  pk  to  A.  Next,  we  give  A  oracle  access  to  the  decryption 
procedure  Dec(sfc,  •).  Next,  A  outputs  two  messages  mo,  mi  G  M.  and  is  given  a  challenge  ciphertext 
c  •{—  Enc (pk,  mb)  for  either  b  =  0  or  b  =  1.  The  scheme  is  CCAl-secure  if  the  views  of  A  (i.e.,  the  public 
key  pk,  the  answers  to  its  oracle  queries,  and  the  ciphertext  c)  for  6  =  0  versus  6=1  are  computationally 
indistinguishable  (i.e.,  A’s  acceptance  probabilities  for  6  =  0  versus  6  =  1  differ  by  only  negl(n)).  In  the 
stronger  CCA2  notion,  after  receiving  the  challenge  ciphertext,  A  continues  to  have  access  to  the  decryption 
oracle  Dec  (sk,  ■)  for  any  query  not  equal  to  the  challenge  ciphertext  c;  security  it  defined  similarly. 


Construction.  To  highlight  the  main  new  ideas,  here  we  present  a  public-key  encryption  scheme  that 
is  CCAl-secure.  Full  CCA2  security  can  be  obtained  via  relatively  generic  transformations  using  either 
strongly  unforgeable  one-time  signatures  flDDNOOt.  or  a  message  authentication  code  and  weak  form  of 
commitment  (BCHK07  ] ;  we  omit  these  details. 

Our  scheme  involves  a  number  of  parameters,  for  which  we  give  some  exemplary  asymptotic  bounds.  In 
what  follows,  u(yJ\ogn)  represents  a  fixed  function  that  asymptotically  grows  faster  than  ^/log n. 
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•  G  G  Zqxnk  is  a  gadget  matrix  for  large  enough  prime  power  q  =  pe  =  poly(n)  and  k  =  0(log  q)  = 
0(log  n).  We  require  an  oracle  O  that  solves  LWE  with  respect  to  A(G4)  for  any  error  vector  in  some 
Vi/2{q  ■  B~4)  where  ||B||  =  0(f).  (See  for  example  the  constructions  from  Section|4j) 

•  m  =  0(nk )  and  V  =  so  that  (A,  AR)  is  negl(n)-far  from  uniform  for  A  •(—  Z”  xm  and 

R  •(—  V,  and  m  =  fh  +  nk  is  the  total  dimension  of  the  public  key  and  ciphertext. 

•  a  is  an  error  rate  for  LWE,  for  sufficiently  large  1/a  =  Oink)  ■  tu(\/log  n). 


Our  scheme  requires  a  special  collection  of  elements  in  the  ring  7 Z  =  Z q[x\/(f(x))  as  constructed  in 
Section  6.1  (recall  that  here  q  =  pe ).  We  need  a  very  large  set  'U  =  { u ] . . . . ,  uf}  C  TZ  with  the  “unit 
differences”  property:  for  any  i  /  j,  the  difference  m  —  Uj  G  7 Z* ,  and  hence  h(m  —  uj )  =  h(uf)  —  h(uj )  G 
Z”xn  is  invertible.  (Note  that  the  uts  need  not  all  be  units  themselves.)  Concretely,  by  the  characterization 
of  units  in  'R,  given  above,  we  take  U  to  be  all  linear  combinations  of  the  monomials  l,x, ,  xn~x  with 
coefficients  in  {0, ...  ,p  —  1},  of  which  there  are  exactly  pn.  Since  the  difference  between  any  two  such 
distinct  elements  is  nonzero  modulo  p,  it  is  a  unit. 

The  system  has  message  space  {0,  l}nk,  which  we  map  bijectively  to  the  cosets  of  A/2A  for  A  =  A(G4) 
via  some  function  encode  that  is  efficient  to  evaluate  and  invert.  Concretely,  letting  S  G  Znkxnk  be  any  basis 
of  A,  we  can  map  m  G  {0,  l}nk  to  encode(m)  =  Sm  G  7Lnk . 


•  Gen(ln):  choose  A  <—  ZgXm  and  R  <—  V,  letting  Ai  =  —  AR  mod  q.  The  public  key  is  pk  =  A  = 
[A  |  A,]  G  Z” xm  and  the  secret  key  is  sk  =  R. 

•  En c(pk  =  [A  |  Ai],m  G  {0,  l}nk):  choose  nonzero  u  <—  U  and  let  Au  =  [A  |  Ai  +  h(u)G], 

Choose  s  Zg,  e  D™aq,  and  ei  D%ks  where  s 2  =  ( || e|| 2  +  rh(aq )2)  •  uj(^logn)2. 

Let 

b4  =  2(sf  Au  mod  q)  +  e4  +  (0,  encode(m))4  mod  2 q, 

where  e  =  (e,  ei)  G  Zm  and  0  has  dimension  m.  (Note  the  use  of  mod- 2^  arithmetic:  2(s4A„  mod  q) 
is  an  element  of  the  lattice  2A(  A^)  D  2gZm.)  Output  the  ciphertext  c  =  (u,  b)  G  U  x  Z^. 

•  Dec(s£:  =  R,  c  =  (u,  b)  G  U  x  Z^):  Let  Au  =  [A  |  Ai  +  h(u) G]  =  [A  |  h(u) G  -  AR], 

1.  If  c  does  not  parse  or  u  =  0,  output  _L.  Otherwise,  call  lnvertc>(R,  Au,  b  mod  q)  to  get  values 
z  G  Z”  and  e  =  (e,  ei)  G  Z m  x  Znfc  for  which  b4  =  z4Au  +  e4  mod  q.  (Note  that  h(u)  G  Z” xn 
is  invertible,  as  required  by  Invert.)  If  the  call  to  Invert  fails  for  any  reason,  output  X. 

2.  If  ||e||  >  aq\frn  or  ||ei  ||  >  aqV^mnk  ■  cu(\/logn),  output  _L. 

3.  Let  v  =  b  —  e  mod  2 q,  parsed  as  v  =  (v,  vi)  G  Z^  x  l^q  ■  If  v  0  2A(A4),  output  _L.  Finally, 

output  encode-1  (v4  [  mod  2 q)  G  {0,  l}nk  if  it  exists,  otherwise  output  X. 

(In  practice,  to  avoid  timing  attacks  one  would  perform  all  of  the  Dec  operations  first,  and  only  then 
finally  output  X  if  any  of  the  validity  tests  failed.) 

Lemma  6.2.  The  above  scheme  has  only  2-fAn)  probability  of  decryption  error. 


The  error  probability  can  be  made  zero  by  changing  Gen  and  Enc  so  that  they  resample  R,  e,  and/or  ei 
in  the  rare  event  that  they  violate  the  corresponding  bounds  given  in  the  proof  below. 
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Proof.  Let  (A,  R)  E-  Gen(ln).  By  Lemma  2.9  we  have  si(R)  <  0(Vnk )  ■  log n)  except  with 
probability  2~^(n).  Now  consider  the  random  choices  made  by  Enc(A,m)  for  arbitrary  m  E  {0, 1  }nk. 
By  Lemma  2.6  we  have  both  ||e||  <  aq\/fri  and  ||ei||  <  aq\/2mnk  ■  w(\/logn),  except  with  probability 


2  n{n)'  Letting  e  =  (e,  ei),  we  have 


||e* [^-]  ||  <  ||e<R||  +  || ei ||  <  aq  ■  0(nk )  •  uj(sJ\ogri). 

In  particular,  for  large  enough  1/a  =  0(nk)  ■  w(\/log  n)  we  have  c1  [  ^  ]  E  'P\/‘Aq  •  B  /-).  Therefore,  the 
call  to  Invert  made  by  Dec(R,  (u,  b))  returns  e.  It  follows  that  for  v  =  (v,  vi)  =  b  —  e  mod  2 q,  we  have 
v  E  2A(A4)  as  needed.  Finally, 


v4[*]  =  2(sth(u)G  mod  q)  +  encode(m)  mod  2 q, 
which  is  in  the  coset  encode(m)  E  A(G<)/2A(G<),  and  so  Dec  outputs  m  as  desired.  □ 


Theorem  6.3.  The  above  scheme  is  CCA1 -secure  assuming  the  hardness  of  decision-VNEqai  for  a'  = 
cr/3  >  2y/n/q. 


Proof  We  start  by  giving  a  particular  form  of  discretized  LWE  that  we  will  need  below.  Given  access  to  an 
LWE  distribution  AS  Q/  over  2JxT  for  any  s  G  Z”  (where  recall  that  T  =  M/Z),  by  UPeilOl  Theorem  3.1] 
we  can  transform  its  samples  (a,  b  =  (s,  a)/g  +  e  mod  1)  to  have  the  form  (a,  2((s,  a)  mod  q)  +  e'  mod  2 q) 
for  e’  <—  Di^aq,  by  mapping  b  i-x  2 qb  +  Di_2qb,s  mod  2 q  where  s 2  =  (aq)2  —  (2 a! q)2  >  4n  >  r\t(/L)2 . 
This  transformation  maps  the  uniform  distribution  over  Z”  x  T  to  the  uniform  distribution  over  Z”  x  TL^q,  so 
the  discretized  distribution  is  pseudorandom  under  the  hypothesis  of  the  theorem. 

We  proceed  via  a  sequence  of  hybrid  games.  The  game  Hq  is  exactly  the  CCA1  attack  with  the  system 
described  above. 

In  game  H\,  we  change  how  the  public  key  A  and  challenge  ciphertext  c*  =  (it*,  b*)  are  constructed,  and 
the  way  that  decryption  queries  are  answered  (slightly),  but  in  a  way  that  introduces  only  negl(n)  statistical 
difference  with  Hq.  At  the  start  of  the  experiment  we  choose  nonzero  u*  ■c-  U  and  let  the  public  key  be 
A  =  [A  |  Ai]  =  [A  |  —h(u*) G  —  AR],  where  A  and  R  are  chosen  in  the  same  way  as  in  Hq.  (In 
particular,  we  still  have  si(R)  <  0(s/nk)  ■  u}(y/logn)  with  overwhelming  probability.)  Note  that  A  is  still 
m'gl(n) -uniform  for  any  choice  of  it*,  so  conditioned  on  any  fixed  choice  of  A,  the  value  of  it*  is  statistically 
hidden  from  the  attacker.  To  aid  with  decryption  queries,  we  also  choose  an  arbitrary  (not  necessarily  short) 
R  E  Zmxnfc  such  that  Ai  =  -AR  mod  q. 

To  answer  a  decryption  query  on  a  ciphertext  (it,  b),  we  use  an  algorithm  very  similar  to  Dec  with 
trapdoor  R.  After  testing  whether  it  =  0  (and  outputting  _L  if  so),  we  call  lnvertc>(R,  A,,,  b  mod  q)  to  get 
some  z  E  Z™  and  e  E  Zm,  where 

Au  =  [A  |  Ai  +  h(u)G]  =  [A  |  h(u  -  n*)G  -  AR], 


(If  Invert  fails,  we  output  _L.)  We  then  perform  steps [2] and [3] on  e  E  Zm  and  v  =  b  —  e  mod  2 q  exactly  as 
in  Dec,  except  that  we  use  R  in  place  of  R  when  decoding  the  message  in  step  [5] 

We  now  analyze  the  behavior  of  this  decryption  routine.  Whenever  it  /  it*,  which  is  the  case  with 
overwhelming  probability  because  it*  is  statistically  hidden,  by  the  “unit  differences”  property  on  U  we  have 
that  h(u  —  u*)  E  Z”'xn  is  invertible,  as  required  by  the  call  to  Invert.  Now,  either  there  exists  an  e  that 
satisfies  the  validity  tests  in  step  Sand  such  that  if  =  ztAu  +  c1  mod  q  for  some  z  E  Z”,  or  there  does  not. 
In  the  latter  case,  no  matter  what  Invert  does  in  Hq  and  H  \ ,  step[2]  will  return  _L  in  both  games.  Now  consider 
the  former  case:  by  the  constraints  on  e,  we  have  e4  [ E  V\ ji(q  •  B  ~4)  in  both  games,  so  the  call  to  Invert 
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must  return  this  e  (but  possibly  different  z)  in  both  games.  Finally,  the  result  of  decryption  is  the  same  in 
both  games:  if  v  G  2A(A4)  (otherwise,  both  games  return  _L),  then  we  can  express  v  as 

v4  =  2(stAu  mod  q)  +  (0,  v')4  mod  2 q 

for  some  s  G  Z™  and  v'  G  Z^f .  Then  for  any  solution  R  G  Zmxnfe  to  Ai  =  —  AR  mod  q,  we  have 

v4[*]  =  2(sth(u)G  mod  g)  +  (V)4  mod  2 q. 

In  particular,  this  holds  for  the  R  in  Hq  and  the  R  in  II  \  that  are  used  for  decryption.  It  follows  that  both 
games  output  encode”1  (V),  if  it  exists  (and  _L  otherwise). 

Finally,  in  H\  we  produce  the  challenge  ciphertext  (u.  b)  on  a  message  m  G  {0, 1  }nk  as  follows.  Let 
u  =  u* ,  and  choose  s  G-  Z”  and  e  G-  D%'aq  as  usual,  but  do  not  choose  ei.  Note  that  Au  =  [A  |  —  AR]. 
Let  b4  =  2(s4 A  mod  q)  +  e4  mod  2 q.  Let 

b4  =  — bfR  +  e4  +  encode(m)  mod  2 q, 

where  e  g-  and  output  (u,  b  =  (b,  bi)).  We  now  show  that  the  distribution  of  (u.  b) 

is  within  negl(n)  statistical  distance  of  that  in  Hq,  given  the  attacker’s  view  (i.e.,  pk  and  the  results  of 
the  decryption  queries).  Clearly,  u  and  b  have  essentially  the  same  distribution  as  in  Hq,  because  u  is 
negl(n)-uniform  given  pk,  and  by  construction  of  b.  By  substitution,  we  have 

b^  =  2(s4(— AR)  mod  q)  +  (e4R  +  e4)  +  encode(m). 


Therefore,  it  suffices  to  show  that  for  fixed  e,  each  (e,  r,)  +  e,  has  distribution  negl(n)-far  from  Dz,s,  where 
s2  =  ( 1 1 e 1 1 2  +  m(aq )2)  •  a;(\/log  n)2,  over  the  random  choice  of  r,  (conditioned  on  the  value  of  Ar,  from 
the  public  key)  and  of  e*.  Because  each  rt  is  an  independent  discrete  Gaussian  over  a  coset  of  A2  (A),  the 
claim  follows  essentially  by  [|Reg05 1.  Corollary  3.10],  but  adapted  to  discrete  random  variables  using  HPeilOI 
Theorem  3.1]  in  place  of  |Reg05[  Claim  3.9]. 

In  game  H  > ,  we  only  change  how  the  b  component  of  the  challenge  ciphertext  is  created,  letting  it  be 
uniformly  random  in  Z^.  We  construct  pk,  answer  decryption  queries,  and  construct  bi  in  exactly  the 
same  way  as  in  H\.  First  observe  that  under  our  (discretized)  LWE  hardness  assumption,  games  Hi  and 
Ho  are  computationally  indistinguishable  by  an  elementary  reduction:  given  (A,  b)  G  Z"xm  x  Zl,”  where 


q 

A  is  uniformly  random  and  either  b1  =  2(s4 A  mod  q)  +  c1  mod  2 q  (for  s  t—  Z”  and  e  g-  D%aq)  or  b 
is  uniformly  random,  we  can  efficiently  emulate  either  game  II  \  or  f/2  (respectively)  by  doing  everything 
exactly  as  in  the  two  games,  except  using  the  given  A  and  b  when  constructing  the  public  key  and  challenge 
ciphertext. 


Now  by  the  leftover  hash  lemma,  ( A,  b4,  AR,  —  b4R)  is  negl(n) -uniform  when  R  is  chosen  as  in  II>. 
Therefore,  the  challenge  ciphertext  has  the  same  distribution  (up  to  negl(n)  statistical  distance)  for  any 
encrypted  message,  and  so  the  adversary’s  advantage  is  negligible.  This  completes  the  proof.  □ 
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Abstract 

We  give  direct  constructions  of  pseudorandom  function  (PRF)  families  based  on  conjectured  hard 
lattice  problems  and  learning  problems.  Our  constructions  are  asymptotically  efficient  and  highly 
parallelizable  in  a  practical  sense,  i.e.,  they  can  be  computed  by  simple,  relatively  small  low-depth 
arithmetic  or  boolean  circuits  (e.g.,  in  NC1  or  even  TC°).  In  addition,  they  are  the  first  low-depth 
PRFs  that  have  no  known  attack  by  efficient  quantum  algorithms.  Central  to  our  results  is  a  new 
“derandomization”  technique  for  the  learning  with  errors  (LWE)  problem  which,  in  effect,  generates  the 
error  terms  deterministically. 


1  Introduction  and  Main  Results 


The  past  few  years  have  seen  significant  progress  in  constructing  public -key,  identity-based,  and  homomorphic 
cryptographic  schemes  using  lattices,  e.g.,  [ Reg05| IPW081 IGPV081  IGen091  ICHKP101  lABB lOall  and  many 
more.  Part  of  their  appeal  stems  from  provable  worst-case  hardness  guarantees  (starting  with  the  seminal 
work  of  Ajtai  ]Ajt96|),  good  asymptotic  efficiency  and  parallelism,  and  apparent  resistance  to  quantum 
attacks  (unlike  the  classical  problems  of  factoring  integers  or  computing  discrete  logarithms). 

Perhaps  surprisingly,  there  has  been  comparatively  less  progress  in  using  lattices  for  symmetric  cryp¬ 
tography,  e.g.,  message  authentication  codes,  block  ciphers,  and  the  like,  which  are  widely  used  in  practice. 
While  in  principle  most  symmetric  objects  of  interest  can  be  obtained  generically  from  any  one-way  function, 
and  hence  from  lattices,  these  generic  constructions  are  usually  very  inefficient,  which  puts  them  at  odds 
with  the  high  performance  demands  of  most  applications.  In  addition,  generic  constructions  often  use  their 
underlying  primitives  (e.g.,  one-way  functions)  in  an  inherently  inefficient  and  sequential  manner.  While 
most  lattice-based  primitives  are  relatively  efficient  and  highly  parallelizable  in  a  practical  sense  (i.e.,  they 
can  be  evaluated  by  small,  low-depth  circuits),  those  advantages  are  completely  lost  when  plugging  them 
into  generic  sequential  constructions.  This  motivates  the  search  for  specialized  constructions  of  symmetric 
objects  that  have  comparable  efficiency  and  parallelism  to  their  lower-level  counterparts. 

Our  focus  in  this  work  is  on  pseudorandom  function  (PRF)  families,  a  central  object  in  symmetric  cryp¬ 
tography  first  rigorously  defined  and  constructed  by  Goldreich,  Goldwasser,  and  Micali  (“GGM”)  llGGM84j. 


‘School  of  Computer  Science,  Georgia  Institute  of  Technology.  Email:  abhishek.banerjee@cc.gatech.edu.  Research 
supported  in  part  by  an  ARC  Fellowship. 

^School  of  Computer  Science,  Georgia  Institute  of  Technology.  Email:  cpeikert@cc  .  gatech .  edu.  This  material  is  based 
upon  work  supported  by  the  National  Science  Foundation  under  Grant  CNS-0716786  and  CAREER  Award  CCF-1054495,  and  by 
the  Alfred  P.  Sloan  Foundation.  Any  opinions,  findings,  and  conclusions  or  recommendations  expressed  in  this  material  are  those  of 
the  authors  and  do  not  necessarily  reflect  the  views  of  the  National  Science  Foundation. 

"Eli  Arazi  School  of  Computer  Science,  IDC  Herzliya.  Email:  alon  .  rosengidc  .ac.il.  Research  supported  in  part  by 
BSF  grant  2010296. 


1 


Approved  for  Public  Release;  Distribution  Unlimited. 
113 


Given  a  PRF  family,  most  central  goals  of  symmetric  cryptography  (e.g.,  encryption,  authentication,  identifica¬ 
tion)  have  simple  solutions  that  make  efficient  use  of  the  PRF.  Informally,  a  family  of  deterministic  functions 
is  pseudorandom  if  no  efficient  adversary,  given  adaptive  oracle  access  to  a  randomly  chosen  function  from 
the  family,  can  distinguish  it  from  a  uniformly  random  function.  The  seminal  GGM  construction  is  based 
generically  on  any  length-doubling  pseudorandom  generator  (and  hence  on  any  one-way  function),  but  it 
requires  k  sequential  invocations  of  the  generator  when  operating  on  /r-bit  inputs. 

In  contrast,  by  relying  on  a  generic  object  called  a  “pseudorandom  synthesizer,”  or  directly  on  concrete 
number-theoretic  problems  (such  as  decision  Diffie-Hellman,  RSA,  and  factoring),  Naor  and  Reingold  [NR951. 
INR971  and  Naor,  Reingold,  and  Rosen  INRROOIl  (see  also  IILW091IBMR101)  constructed  very  elegant  and 
more  efficient  PRFs,  which  can  in  principle  be  computed  in  parallel  by  low-depth  circuits  (e.g.,  in  NC2 
or  TC°).  However,  achieving  such  low  depth  for  their  number-theoretic  constructions  requires  extensive 
preprocessing  and  enormous  circuits,  so  their  results  serve  mainly  as  a  proof  of  theoretical  feasibility  rather 
than  practical  utility. 

In  summary,  thus  far  all  parallelizable  PRFs  from  commonly  accepted  cryptographic  assumptions 
rely  on  exponentiation  in  large  multiplicative  groups,  and  the  functions  (or  at  least  their  underlying  hard 
problems)  can  be  broken  by  polynomial-time  quantum  algorithms.  While  lattices  appear  to  be  a  natural 
candidate  for  avoiding  these  drawbacks,  and  there  has  been  some  partial  progress  in  the  form  of  randomized 
weak  PRFs  HACPS091  and  randomized  MACs  HPielOl  KPC+llt.  constructing  an  efficient,  parallelizable 
(deterministic)  PRF  under  lattice  assumptions  has,  frustratingly,  remained  open  for  some  time  now. 


1.1  Results  and  Techniques 


In  this  work  we  give  the  first  direct  constructions  of  PRF  families  based  on  lattices,  via  the  learning  with 
errors  (LWE)  [Reg05]  and  rmg-LWE  [LPR101  problems,  and  some  new  variants.  Our  constructions  are 


highly  parallelizable  in  a  practical  sense,  i.e.,  they  can  be  computed  by  relatively  small  low-depth  circuits,  and 
the  runtimes  are  also  potentially  practical.  (However,  their  performance  and  key  sizes  are  still  far  from  those 
of  heuristically  designed  functions  like  AES.)  In  addition,  (at  least)  one  of  our  constructions  can  be  evaluated 
in  the  circuit  class  TC°  (i.e.,  constant-depth,  poly-sized  circuits  with  unbounded  fan-in  and  threshold  gates), 
which  asymptotically  matches  the  shallowest  known  PRF  constructions  based  on  the  decision  Diffie-Hellman 
and  factoring  problems  1NR97,  NRR001. 

As  a  starting  point,  we  recall  that  in  their  work  introducing  synthesizers  as  a  foundation  for  PRFs  IINR95H. 
Naor  and  Reingold  described  a  synthesizer  based  on  a  simple,  conjectured  hard-to-learn  function.  At  first 
glance,  this  route  seems  very  promising  for  obtaining  PRFs  from  lattices,  using  LWE  as  the  hard  learning 
problem  (which  is  known  to  be  as  hard  as  worst-case  lattice  problems  ||Reg05  |Pei09l).  However,  a  crucial 
point  is  that  Naor  and  Reingold’s  synthesizer  uses  a  deterministic  hard-to-leam  function,  whereas  LWE’s 
hardness  depends  essentially  on  adding  random,  independent  errors  to  every  output  of  a  mod-q  “parity” 
function.  (Indeed,  without  any  error,  parity  functions  are  trivially  easy  to  learn.)  Probably  the  main  obstacle 
so  far  in  constructing  efficient  lattice/LWE-based  PRFs  has  been  in  finding  a  way  to  introduce  (sufficiently 
independent)  error  terms  into  each  of  the  exponentially  many  function  outputs,  while  still  keeping  the 
function  deterministic  and  its  key  size  a  fixed  polynomial.  As  evidence,  consider  that  recent  constructions 
of  weaker  primitives  such  as  symmetric  authentication  protocols  I H BO  1 ,  JW05 ,  iKSS06 1 .  randomized  weak 
PRFs  HACPS091.  and  message-authentication  codes  [PielOt  lKPC+l  ill  from  noisy-leaming  problems  are 
all  inherently  randomized  functions,  where  security  relies  on  introducing  fresh  noise  at  every  invocation. 
Unfortunately,  this  is  not  an  option  for  deterministic  primitives  like  PRFs. 
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Derandomizing  LWE.  To  resolve  the  above-described  issues,  our  first  main  insight  is  a  way  of  partially 
“derandomizing”  the  LWE  problem,  i.e.,  generating  the  errors  efficiently  and  deterministically,  while  pre¬ 
serving  hardness.  This  technique  immediately  yields  a  deterministic  synthesizer  and  hence  a  simple  and 
parallelizable  PRF,  though  with  a  few  subtleties  specific  to  our  technique  that  we  elaborate  upon  below. 

Before  we  explain  the  derandomization  idea,  first  recall  the  learning  with  errors  problem  LWEn  (?ia  in 
dimension  n  (the  main  security  parameter)  with  modulus  q  and  error  rate  a.  We  are  given  many  independent 
pairs  (aj,  bi )  £  Z™  x  7Lq,  where  each  a*  is  uniformly  random,  and  the  b,  are  all  either  “noisy  inner  products” 
of  the  form  bi  =  (a,-,  s)  +  e,  mod  q  for  a  random  secret  s  £  Z™  and  “small”  random  error  terms  er  G  Z 
of  magnitude  ~  aq,  or  are  uniformly  random  and  independent  of  the  a j.  The  goal  of  the  (decision)  LWE 
problem  is  to  distinguish  between  these  two  cases,  with  any  non-negligible  advantage.  In  the  r/ng-LWE 
problem  HLPR10L  we  are  instead  given  noisy  ring  products  bt  ~  a,  •  s,  where  s  and  the  a*  are  random 
elements  of  a  certain  polynomial  ring  Rq  (the  canonical  example  being  Rq  =  Z,?[^]/  (zn  +  1)  for  n  a  power 
of  2),  and  the  error  terms  are  “small”  in  a  certain  basis  of  the  ring;  the  goal  again  is  to  distinguish  these 
from  uniformly  random  pairs.  While  the  dimension  n  is  the  main  hardness  parameter,  the  error  rate  a  also 
plays  a  very  important  role  in  both  theory  and  practice:  as  long  as  the  “absolute”  error  aq  exceeds  \Jn 
or  so,  (ring-)LWE  is  provably  as  hard  as  approximating  conjectured  hard  problems  on  (ideal)  lattices  to 
within  0(n/a )  factors  in  the  worst  case  [Reg05|  |Pei09l  [LPR 1  Oil.  Moreover,  known  attacks  using  lattice 
basis  reduction  (e.g.,  llLLL82l  ISch87ll)  or  combinatorial/algebraic  methods  IIBKW03HAG11]  require  time 
2?i(r/./  iog(  i /Vi:)) ,  where  the  Q(.  j  notation  hides  polylogarithmic  factors  in  n.  We  emphasize  that  without  the 
error  terms,  (ring-)LWE  would  become  trivially  easy,  and  that  all  prior  hardness  results  for  LWE  and  its  many 
variants  (e.g.,  [ Reg05 1 |Pei091  IGKP V 1 01 ILPR 1 01  iPieTOll )  require  random,  independent  errors. 

Our  derandomization  technique  for  LWE  is  very  simple:  instead  of  adding  a  small  random  error  term  to 
each  inner  product  (a,,  s)  £  7Lq,  we  just  deterministically  round  it  to  the  nearest  element  of  a  sufficiently 
“coarse”  public  subset  of  p  <C  q  well-separated  values  in  Zf/  (e.g.,  a  subgroup).  In  other  words,  the  “error 
term”  comes  solely  from  deterministically  rounding  (aj,  s)  to  a  relatively  nearby  value.  Since  there  are  only  p 
possible  rounded  outputs  in  7Lq,  it  is  usually  easier  to  view  them  as  elements  of  Zp  and  denote  the  rounded 
value  by  [(aj,  s)~\p  £  Zp.  We  call  the  problem  of  distinguishing  such  rounded  inner  products  from  uniform 
samples  the  learning  with  rounding  (LWRn  giP)  problem.  Note  that  the  problem  can  be  hard  only  if  q  >  p 
(otherwise  no  error  is  introduced),  that  the  “absolute”  error  is  roughly  q/p,  and  that  the  “error  rate”  relative 
to  q  (i.e.,  the  analogue  of  a  in  the  LWE  problem)  is  on  the  order  of  1/p. 

We  show  that  for  appropriate  parameters,  LWR„.,M,  is  at  least  as  hard  as  LWEnjg)Q  for  an  error  rate  a 
proportional  to  1/p,  giving  us  a  worst-case  hardness  guarantee  for  LWR.  In  essence,  the  reduction  relies 
on  the  fact  that  with  high  probability,  we  have  [(a,  s)  +  e]p  =  [(a,  s)]p  when  e  is  small  relative  to  q/p, 
while  \  U (Zg)"| p  ~  (7(Zp)  where  U  denotes  the  uniform  distribution.  Therefore,  given  samples  (a,,  bj) 
of  an  unknown  type  (either  LWE  or  uniform),  we  can  simply  round  the  bi  terms  to  generate  samples  of  a 
corresponding  type  (LWR  or  uniform,  respectively).  (The  formal  proof  is  somewhat  more  involved,  because 
it  has  to  deal  with  the  rare  event  that  the  error  term  changes  the  rounded  value.)  In  the  ring  setting,  the 
derandomization  technique  and  hardness  proof  based  on  ring- LWE  all  go  through  without  difficulty  as 
well.  While  our  proof  needs  both  the  ratio  q/p  and  the  inverse  LWE  error  rate  1  /a  to  be  slightly  super¬ 
polynomial  in  n,  the  state  of  the  art  in  attack  algorithms  indicates  that  as  long  as  q/p  is  an  integer  (so  that 
\_U (Z q)~\p  =  U (Z p))  and  is  at  least  Cl(y/n),  LWR  may  be  exponentially  hard  (even  for  quantum  algorithms) 
for  any  p  =  poly(n),  and  superpolynomially  hard  when  p  =  2”"  for  any  e  <  1. 

We  point  out  that  in  LWE-based  cryptosystems,  rounding  to  a  fixed,  coarse  subset  is  a  common  method  of 
removing  noise  and  recovering  the  plaintext  when  decrypting  a  “noisy”  ciphertext;  here  we  instead  use  it  to 
avoid  having  to  introduce  any  random  noise  in  the  first  place.  We  believe  that  this  technique  should  be  useful 
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in  many  other  settings,  especially  in  symmetric  cryptography.  For  example,  the  LWR  problem  immediately 
yields  a  simple  and  practical  pseudorandom  generator  that  does  not  require  extracting  biased  (e.g.,  Gaussian) 
random  values  from  its  input  seed,  unlike  the  standard  pseudorandom  generators  based  on  the  LWE  or  LPN 
(learning  parity  with  noise)  problems.  In  addition,  the  rounding  technique  and  its  implications  for  PRFs  arc 
closely  related  to  the  “modulus  reduction”  technique  from  a  concurrent  and  independent  work  of  Brakerski 
and  Vaikuntanathan  HBVllal  on  fully  homomorphic  encryption  from  LWE,  and  a  very  recent  follow-up  work 
of  Brakerski,  Gentry,  and  Vaikuntanathan  HBGV11I1:  see  SectionfO] below  for  a  discussion  and  comparison. 

LWR-based  synthesizers  and  PRFs.  Recall  from  HNR951  that  a  pseudorandom  synthesizer  is  a  two- 
argument  function  S(-,  •)  such  that,  for  random  and  independent  sequences  x\. ... .  xrn  and  y\, ... ,  yrn 
of  inputs  (for  any  m  =  poly(rc)),  the  matrix  of  all  m2  values  Zij  =  S(xt,  yj)  is  pseudorandom  (i.e., 
computationally  indistinguishable  from  uniform).  A  synthesizer  can  be  seen  as  an  (almost)  length -squaring 
pseudorandom  generator  with  good  locality  properties,  in  that  it  maps  2 m  random  “seed”  elements  (the  xt 
and  yj)  to  m2  pseudorandom  elements,  and  any  component  of  its  output  depends  on  only  two  components  of 
the  input  seed. 

Using  synthesizers  in  a  recursive  tree-like  construction,  Naor  and  Reingold  gave  PRFs  on  k-bit  inputs, 
which  can  be  computed  using  a  total  of  about  k  synthesizer  evaluations,  arranged  nicely  in  only  lg  k  levels 
(depth).  Essentially,  the  main  idea  is  that  given  a  synthesizer  S(-,-)  and  two  independent  PRF  instances  Fy 
and  F\  on  t  input  bits  each,  one  gets  a  PRF  on  2 1  input  bits,  defined  as 


F(xi---X2t)  =  S(F0(xi---xt),  F\{xt+\  ■  ■  ■  X2t) )  ■  (1.1) 

The  base  case  of  a  1-bit  PRF  can  trivially  be  implemented  by  returning  one  of  two  random  strings  in  the 
function’s  secret  key.  Using  particular  NC1  synthesizers  based  on  a  variety  of  both  concrete  and  general 
assumptions,  Naor  and  Reingold  therefore  obtain  fc-bit  PRFs  in  NC2,  i.e.,  having  circuit  depth  0(log2  k). 

We  give  a  very  simple  and  computationally  efficient  LWRniq]P -based  synthesizer  Sn,q.p :  Z”  x  Z™  — >  Zp, 
defined  as 

Sn,q,p(&-i  s)  =  L(3-]S)]p.  (1-2) 

(In  this  and  what  follows,  products  of  vectors  or  matrices  over  7Lq  are  always  performed  modulo  q.)  Pseudoran¬ 
domness  of  this  synthesizer  under  LWR  follows  by  a  standard  hybrid  argument,  using  the  fact  that  the  a*  vec¬ 
tors  given  in  the  LWR  problem  are  public.  (In  fact,  the  synthesizer  outputs  S(aq,  s.; )  are  pseudorandom  even 
given  the  a*.)  To  obtain  a  PRF  using  the  tree  construction  of  INR951.  we  need  the  synthesizer  output  length 
to  roughly  match  its  input  length,  so  we  actually  use  the  synthesizer  Tni?iP(Si,  S2)  =  [Si  •  S-;>]p  E  Z”xn  for 
S,;  E  Z™XTl.  Note  that  the  matrix  multiplication  can  be  done  with  a  constant-depth,  sizc-0(n2)  arithmetic 
circuit  over  Z9.  Or  for  better  space  and  time  complexity,  we  can  instead  use  the  ring-LWR  synthesizer 
SR,q,p(si,  S2)  =  |si  '  s2lp,  since  the  ring  product  si  •  S2  E  Rq  is  the  same  size  as  si,  S2  E  Rq.  The  ring 
product  can  also  be  computed  with  a  constant  depth,  size-0(n2)  circuit  over  Z q,  or  in  O(logn)  depth  and 
only  0(n  log  n)  scalar  operations  using  Fast  Fourier  Transform-like  techniques  IIUMPR08  ,  1.PR101. 

Using  the  recursive  input-doubling  construction  from  Equation  <H3  above,  we  get  the  following  concrete 
PRF  with  input  length  k  =  2d.  Let  q,i  >  qy-i  >  ■  ■  •  >  qo  >  2  be  a  chain  of  moduli  where  each  qj/qj- 1 
is  a  sufficiently  large  integer,  e.g.,  q3  =  q:r+  1  for  some  q  >  y/n.  The  secret  key  is  a  set  of  2k  matrices 
S E  ZgdXTl  for  each  i  E  {1, . . . ,  k}  and  b  E  {0, 1}.  Each  pair  (S^o,  S^i)  defines  a  1-bit  PRF  F.j{b)  =  S^, 
and  these  are  combined  in  a  tree-like  fashion  according  to  Equation  <U3  using  the  appropriate  synthesizers 
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Tn,qj,qj_1  for  j  =  d, ,  1.  As  a  concrete  example,  when  k  =  8  (so  x  =  x\  ■  ■  ■  xs  and  d  =  3),  we  have 


F{Si:h}(x)  = 


|Sl,si-  ®2,i21o2‘  L^3,a;3  '  84^4 


92 


9i 


(1.3) 


90 


(In  the  ring  setting,  we  just  use  random  elements  Sj ^  E  Rqd  in  place  of  the  matrices  S,/,.)  Notice  that  the 
function  involves  d  =  lg  k  levels  of  matrix  (or  ring)  products,  each  followed  by  a  rounding  operation.  In  the 
exemplary  case  where  qj  =  qJ+l,  the  rounding  operations  essentially  drop  the  “least-significant”  base-q  digit, 
so  they  can  be  implemented  very  easily  in  practice,  especially  if  every  qj  is  a  power  of  2.  The  function  is  also 
amenable  to  all  of  the  nice  time/space  trade-offs,  seed-compression  techniques,  and  incremental  computation 
ideas  described  in  IINR95I. 

In  the  security  proof,  we  rely  on  the  conjectured  hardness  of  LWR9j  ^  4  for  j  =  d, ... ,  1.  The  strongest 
of  these  assumptions  appears  to  be  for  j  =  d,  and  this  is  certainly  the  case  when  relying  on  our  reduction  from 
LWE  to  LWR.  For  the  example  parameters  qj  =  qJ~]  where  q  ~  \/Ti,  the  dominating  assumption  is  therefore 
the  hardness  of  LWR^+i^,  which  involves  a  quasi-polynomial  inverse  error  rate  of  1  /a  ~  qd  =  n0(lgk> . 
However,  because  the  strongest  assumptions  are  applied  to  the  “innermost”  layers  of  the  function,  it  is  unclear 
whether  security  actually  requires  such  strong  assumptions,  or  even  whether  the  innermost  layers  need  to  be 
rounded  at  all.  We  discuss  these  issues  further  in  Section [L2l below. 


Degree-A;  synthesizers  and  shallower  PRFs.  One  moderate  drawback  of  the  above  function  is  that  it 
involves  lg  k  levels  of  rounding  operations,  which  appears  to  lower-bound  the  depth  of  any  circuit  computing 
the  function  by  0(lg  k ).  Is  it  possible  to  do  better? 

Recall  that  in  later  works,  Naor  and  Reingold  1NR971  and  Naor,  Reingold,  and  Rosen  INRROOl  gave 
direct,  more  efficient  number-theoretic  PRF  constructions  which,  while  still  requiring  exponentiation  in  large 
multiplicative  groups,  can  in  principle  be  computed  in  very  shallow  circuit  classes  like  NC1  or  even  TC°. 
Their  functions  can  be  interpreted  as  “degree-/,:”  (or  A:-argumcnt)  synthesizers  for  arbitrary  k  =  poly(n), 
which  immediately  yield  A: -bit  PRFs  without  requiring  any  composition.  With  this  in  mind,  a  natural  question 
is  whether  there  are  direct  LWE/LWR-based  synthesizers  of  degree  k  >  2. 

We  give  a  positive  answer  to  this  question.  Much  like  the  functions  of  [NR97 .  NRROOil.  ours  have  a 
subset-product  structure.  We  have  public  moduli  q  3>  p,  and  the  secret  key  is  a  set  of  k  matrices  S,  e  Z”xn 
(whose  distributions  may  not  necessarily  be  uniform;  see  below)  for  i  =  1, . . . ,  k,  along  with  a  uniformly 
random  a  E  Z^J^The  function  F  =  Fa  {S.}  :  {0,  T}/::  — y  Z”  is  defined  as  the  “rounded  subset-product” 


Fa,{Si}(xi  ■■■Xk) 


(1.4) 


The  ring  variant  is  analogous,  replacing  a  with  uniform  a  E  Rq  (or  R*,  the  set  of  invertible  elements) 
and  each  S,  by  some  E  Rq.  This  function  is  particularly  efficient  to  evaluate  using  the  discrete  Fourier 
transform,  as  is  standard  with  ring-based  primitives  (see,e.g.,  I LM PROS ,  LPR1011).  In  addition,  similarly 
to  llNR97l  iNRROOl .  one  can  optimize  the  subset-product  operation  via  pre-processing,  and  evaluate  the 
function  in  TC°.  We  elaborate  on  these  optimizations  in  Section 

For  the  security  analysis  of  construction  (|1.4[),  we  have  meaningful  security  proofs  under  various  condi¬ 
tions  on  the  parameters  and  computational  assumptions,  including  standard  LWE.  In  our  LWE-based  proof, 
two  important  issues  are  the  distribution  of  the  secret  key  components  S,;,  and  the  choice  of  moduli  q  and  p. 

'To  obtain  longer  function  outputs,  we  can  replace  a£ZJ  with  a  uniformly  random  matrix  A  £  ZJ Xm  for  any  m  =  poly(n). 


5.2 
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For  the  former,  it  turns  out  that  our  proof  needs  the  S,  matrices  to  be  short,  i.e.,  their  entries  should  be  drawn 
from  the  LWE  error  distribution.  (LWE  is  no  easier  to  solve  for  such  short  secrets  llACPS09il.)  This  appears 
to  be  an  artifact  of  our  proof  technique,  which  can  be  viewed  as  a  variant  of  our  LWE-to-LWR  reduction, 
enhanced  to  handle  adversarial  queries.  Summarizing  the  approach,  define 


G'(*)=G'a,{Si}(*)  :=a4-nS? 


to  be  the  subset-product  function  inside  the  rounding  operation  of  (|1.4[).  The  fact  that  F  =  [Gfp  lets  us 
imagine  adding  independent  error  terms  to  each  distinct  output  of  G,  but  only  as  part  of  a  thought  experiment 
in  the  proof.  More  specifically,  we  consider  a  related  randomized  function  G  =  Ga  qs.j  :  {0, 1 } k  — >•  Z™ 
that  computes  the  subset-product  by  multiplying  by  each  S/’;  in  turn,  but  then  also  adds  a  fresh  error  term 
immediately  following  each  multiplication.  Using  the  LWE  assumption  and  induction  on  k,  we  can  show  that 
the  randomized  function  G  is  itself  pseudorandom  (over  Z9),  hence  so  is  \  G~ |p  (over  Zp).  Moreover,  we  show 
that  for  every  queried  input,  with  high  probability  \G\P  coincides  with  \G\P  =  F,  because  G  and  G  differ 
only  by  a  cumulative  error  term  that  is  small  relative  to  q — this  is  where  we  need  to  assume  that  the  entries  of 
S i  are  small.  Finally,  because  [G\p  is  a  (randomized)  pseudorandom  function  over  Zp  that  coincides  with 
the  deterministic  function  F  on  all  queries,  we  can  conclude  that  F  is  pseudorandom  as  well. 

In  the  above-described  proof  strategy,  the  gap  between  G  and  G  grows  exponentially  in  k,  because  we 
add  a  separate  noise  term  following  each  multiplication  by  an  Sj,  which  gets  enlarged  when  multiplied  by 
all  the  later  S So  in  order  to  ensure  that  \Gf\p  =  [G\p  on  all  queries,  our  LWE-based  proof  needs  both  the 
modulus  q  and  inverse  error  rate  1/a  to  exceed  nn-k).  In  terms  of  efficiency  and  security,  this  compares 
rather  unfavorably  with  the  quasipolynomial  n°(lg^  bound  in  the  proof  for  our  tree-based  construction, 
though  on  the  positive  side,  the  direct  degree- A;  construction  has  better  circuit  depth.  However,  just  as  with 
construction  it  is  unclear  whether  such  strong  assumptions  and  large  parameters  are  actually  necessary 
for  security,  or  whether  the  matrices  Si  really  need  to  be  short. 

In  particular,  it  would  be  nice  if  the  function  in  (|1.4[)  were  secure  if  the  Sj  matrices  were  uniformly 
random  over  Z"xn,  because  we  could  then  recursively  compose  the  function  in  a  A: -ary  tree  to  rapidly  extend 
its  input  lengthjj  It  would  be  even  better  to  have  a  security  proof  for  a  smaller  modulus  q  and  inverse  error  rate 
1/a,  ideally  both  polynomial  in  n  even  for  large  k.  While  we  have  been  unable  to  find  such  a  security  proof 
under  standard  LWE,  we  do  give  a  very  tight  proof  under  a  new,  interactive  “ related  samples ”  LWE/LWR 
assumption.  Roughly  speaking,  the  assumption  says  that  LWE/LWR  remains  hard  even  when  the  sampled 
a j  vectors  are  related  by  adversarially  chosen  subset-products  of  up  to  k  given  random  matrices  (drawn 
from  some  known  distribution).  This  provides  some  evidence  that  the  function  may  indeed  be  secure  for 
appropriately  distributed  Sj,  small  modulus  q,  and  large  k.  For  further  discussion,  see  Section  1 1 .2 


PRFs  via  the  GGM  construction.  The  above  constructions  aim  to  minimize  the  depth  of  the  circuit 
evaluating  the  PRF.  However,  if  parallel  complexity  is  not  a  concern,  and  one  wishes  to  minimize  the 
total  amount  of  work  per  PRF  evaluation  (or  the  seed  length),  then  the  original  GGM  construction  with  an 
LWR-based  pseudorandom  generator  may  turn  out  to  be  even  more  efficient  in  practice. 

Recall  that  the  GGM  construction  makes  generic  use  of  any  length-doubling  pseudorandom  generator 
G  :  {0,  l}n  — »  {0,  l}2n.  The  generator’s  output  G(s)  is  viewed  as  a  pair  (Gq(s),  G\(s)),  where  |Go(s)|  = 
|Gi(s)|  =  n.  The  key  for  a  member  of  the  PRF  family  is  a  seed  s  for  G,  and  on  input  x  €  {0, 1 } k  the 

2Note  that  we  can  always  compose  the  degree-fc  function  with  our  degree-2  synthesizers  from  above,  but  this  would  only  yield  a 
tree  with  2-ary  internal  nodes. 
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function  is  defined  as 


(1-5) 


Fs(xi  ■  ■  ■  Xk)  —  GXk(GXk_ i(-  •  •  GXl(s)  •••))• 

As  mentioned  above,  the  LWR  problem  immediately  yields  a  simple  and  practical  pseudorandom  generator 
that,  in  contrast  to  the  generators  obtained  from  the  LWE  or  LPN  problems,  does  not  require  extracting 
biased  random  error  terms  from  its  input  seed.  By  plugging  this  generator  into  the  GGM  construction 
we  immediately  get  a  PRF  whose  evaluation  involves  precisely  k  sequential  evaluations  of  the  underlying 
generator. 

The  LWR-based  generator  that  we  have  in  mind  is  a  function  Ga  ■  Z”  ->  Z"',  where  the  moduli  q  p 
and  the  (uniformly  random)  matrix  A  6  Z”xm  are  publicly  known.  Given  a  seed  s  G  Z”,  the  generator  is 
defined  as 

Ga(s)=  La*  ■  s]p .  (1.6) 

The  generator’s  seed  length  (in  bits)  is  n  log2  q  and  its  output  length  is  rn  log2  p,  which  gives  an  expansion 
rate  of  (m log2 p)/(n log2  q)  =  ( m/n )  log qp.  For  example,  to  obtain  a  length-doubling  generator  we  may 
set  q  =  p2  =  22k  >  n  and  rn  =  An.  (Other  choices  yielding  different  expansion  rates  are  of  course  possible.) 
This  choice  of  parameters  has  the  additional  benefit  of  admitting  a  practical  implementation  of  the  rounding 
and  inner-product  operations.  Note  also  that  when  evaluating  the  resulting  PRF,  one  can  get  the  required  part 
of  Ga(s)  by  computing  only  the  inner  products  of  s  with  the  corresponding  columns  of  A,  not  the  entire 
product  A*  •  s. 

For  an  even  faster  implementation  one  may  replace  Ga  by  its  analogous  ring  variant,  obtained  by 
replacing  A  G  Z”xm  with  uniform  a  G  R™,  and  s  G  Z”  with  uniform  s  G  Rq.  As  noted  before,  the  ring 
variant  is  particularly  efficient  to  evaluate  using  Fast  Fourier  Transform-like  algorithms. 

1.2  Discussion  and  Open  Questions 

The  quasipolynomial  n°(logfc)  or  exponential  n°(k)  moduli  and  inverse  error  rates  used  in  our  LWE-based 
security  proofs  are  comparable  to  those  used  in  recent  fully  homomorphic  encryption  (FFIE)  schemes 
(e.g.,  HGen091[vGHVI01IBVl  IbllBVl  lalIBGVl  III),  hierarchical  identity-based  encryption  (HIBE)  schemes 
(e.g.,  itCHKPlOllABBlOallABBlObll).  and  other  lattice-based  constructions.  Flowever,  there  appears  to  be  a 
major  difference  between  our  use  of  such  strong  assumptions,  and  that  of  schemes  such  as  FHE/HIBE  in 
the  public-key  setting.  Constructions  of  the  latter  systems  actually  reveal  LWE  samples  having  very  small 
error  rates  (which  are  needed  to  ensure  correctness  of  decryption)  to  the  attacker,  and  the  attacker  can  break 
the  cryptosystems  by  solving  those  instances.  Therefore,  the  underlying  assumptions  and  the  true  security 
of  the  schemes  are  essentially  equivalent.  In  contrast,  our  PRF  uses  (small)  errors  only  as  part  of  a  thought 
experiment  in  the  security  proof,  not  for  any  purpose  in  the  operation  of  the  function  itself.  This  leaves  open 
the  possibility  that  our  functions  (or  slight  variants)  remain  secure  even  for  much  larger  input  lengths  and 
smaller  moduli  than  our  proofs  require.  We  conjecture  that  this  is  the  case,  even  though  we  have  not  yet  found 
security  proofs  (under  standard  assumptions)  for  these  more  efficient  parameters.  Certainly,  determining 
whether  there  are  effective  cryptanalytic  attacks  is  a  very  interesting  and  important  research  direction. 

Note  that  in  our  construction  (|1.4[),  if  we  draw  the  secret  key  components  from  the  uniform  (or  error) 
distribution  and  allow  k  to  be  too  large  relative  to  q,  then  the  function  can  become  insecure  via  a  simple 
attack  (and  our  new  “interactive”  LWR  assumption,  which  yields  a  tight  security  proof,  becomes  false). 
This  is  easiest  to  see  for  the  ring-based  function:  representing  each  Si  G  Rq  by  its  vector  of  “Fourier 
coefficients”  over  Z”,  each  coefficient  is  0  with  probability  about  1/q  (depending  on  the  precise  distribution 
of  Si).  Therefore,  with  noticeable  probability  the  product  of  A:  =  0(q  log  n)  random  Sj  will  have  all-0  Fourier 
coefficients,  i.e.,  will  be  0  G  Rq.  In  this  case  our  function  will  return  zero  on  the  all-ls  input,  in  violation 
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of  the  PRF  requirement.  (A  similar  but  more  complicated  analysis  can  also  be  applied  to  the  matrix-based 
function.)  Of  course,  an  obvious  countermeasure  is  just  to  restrict  the  secret  key  components  to  be  invertible', 
to  our  knowledge,  this  does  not  appear  to  have  any  drawback  in  terms  of  security.  In  fact,  it  is  possible  to 
show  that  the  decision-(ring-)LWE  problem  remains  hard  when  the  secret  is  restricted  to  be  invertible  (and 
otherwise  drawn  from  the  uniform  or  error  distribution),  and  this  fact  may  be  useful  in  further  analysis  of  the 
function  with  more  efficient  parameters. 

In  summary,  our  work  raises  several  interesting  concrete  questions,  including: 

•  Is  LWRn,g,p  really  exponentially  hard  for  p  =  poly(n)  and  sufficiently  large  integer  q/p  =  poly(n)? 
Are  there  stronger  worst-case  hardness  guarantees  than  our  current  proof  based  on  LWE? 

•  Is  there  a  security  proof  for  construction  (|1.4[)  (with  k  =  cu(l))  for  poly(n)-bounded  moduli  and 
inverse  error  rates,  under  a  non-interactive  assumption? 

•  In  construction  (|1.4|),  is  there  a  security  proof  (under  a  non-interactive  assumption)  for  uniformly 
random  S,?  Is  there  any  provable  security  advantage  to  using  invertible  S ,? 

•  Our  derandomization  technique  and  LWR  problem  require  working  with  moduli  q  greater  than  2.  Is 
there  an  efficient,  parallel  PRF  construction  based  on  the  learning  parity  with  noise  (LPN)  problem? 

1.3  Other  Related  Work 

Most  closely  related  to  the  techniques  in  this  work  are  two  very  recent  results  of  Brakerski  and  Vaikun- 
tanathan  IB VI 1  all  and  a  follow-up  work  of  Brakerski,  Gentry,  and  Vaikuntanathan  I'BGVl  11  on  fully  homo¬ 
morphic  encryption  from  LWE.  In  particular,  the  former  work  includes  a  “modulus  reduction”  technique  for 
LWE-based  cryptosystems,  which  maps  a  large-modulus  ciphertext  to  a  small-modulus  one;  this  induces  a 
shallower  decryption  circuit  and  allows  the  system  to  be  “bootstrapped”  into  a  fully  homomorphic  scheme 
using  the  techniques  of  IGen091.  The  modulus-reduction  technique  involves  a  rounding  operation  much  like 
the  one  we  use  to  derandomize  LWE;  while  they  use  it  on  ciphertexts  that  are  already  “noisy,”  we  apply  it 
to  noise-free  LWE  samples.  Our  discovery  of  the  rounding/derandomization  technique  in  the  PRF  context 
was  independent  of  IBVllal.  In  fact,  the  first  PRF  and  security  proof  we  found  were  for  the  direct  degree-/,: 
construction  defined  in  (|1.4|),  not  the  synthesizer-based  construction  in  (|  1 ,3[).  As  another  point  of  comparison, 
the  “somewhat  homomorphic”  cryptosystem  from  IBVllal  that  supports  degree-/,:  operations  (along  with  all 
prior  ones,  e.g.,  llGen09l IvGHV  1  011 )  involves  an  inverse  error  rate  of  n°(k\  much  like  the  LWE-based  proof 
for  our  degree-/,'  synthesizer. 

Building  on  the  modulus  reduction  technique  of  IIB  V I  I  all.  Brakerski  et  al.  HBGVllll  showed  that  homo¬ 
morphic  cryptosystems  can  support  certain  degree- k  functions  using  a  much  smaller  modulus  and  inverse 
error  rate  of  n°(logfcL  The  essential  idea  is  to  interleave  the  homomorphic  operations  with  several  “small” 
modulus-reduction  steps  in  a  tree-like  fashion,  rather  than  performing  all  the  homomorphic  operations 
followed  by  one  “huge”  modulus  reduction.  This  very  closely  parallels  the  difference  between  our  direct 
degree-/,:  synthesizer  and  the  Naor-Reingold-like  HNR95I  composed  synthesizer  defined  in  <|1 .3|>.  Indeed,  after 
we  found  construction  (|1.4[),  the  result  of  IBGV 1 11  inspired  our  search  for  a  PRF  having  similar  tree-like 
structure  and  quasipolynomial  error  rates.  Given  our  degree-2  synthesizer,  the  solution  turned  out  to  largely 
be  laid  out  in  the  work  of  1NR951.  We  find  it  very  interesting  that  the  same  quantitative  phenomena  arise  in 
two  seemingly  disparate  settings  (PRFs  and  FHE). 
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1.4  Organization 

The  rest  of  the  paper  is  organized  as  follows.  In  Section [2]  we  recall  the  necessary  preliminaries  regarding 
PRFs  and  the  (ring-)LWE  problem.  In  Section[3]we  introduce  the  “learning  with  rounding”  (LWR)  problem 
and  discuss  its  relationship  with  LWE.  In  Section[4]we  describe  LWR-based  (degree-2)  synthesizers  and  the 
PRFs  that  follow  from  them.  In  Section [5]  we  describe  our  direct  degree-/;:  synthesizer/PRF  and  its  security 
proofs  under  the  LWE  and  “subset-product”  LWR  problem. 

2  Preliminaries 

For  a  probability  distribution  X  over  a  domain  D,  let  Xn  denote  its  n-fold  product  distribution  over  I)"  . 
The  uniform  distribution  over  a  finite  domain  D  is  denoted  by  U(D).  The  discrete  Gaussian  probability 
distribution  over  Z  with  parameter  r  >  0,  denoted  assigns  probability  proportional  to  exp(— irx2 /r2) 
to  each  x  e  Z.  It  is  possible  to  efficiently  sample  from  this  distribution  (up  to  negl(n)  statistical  distance)  via 
rejection  [GPVOSjj. 

For  any  integer  modulus  q>2,Zq  denotes  the  quotient  ring  of  integers  modulo  q.  We  define  a  ‘rounding’ 
function  |_-"|p :  Z,;  — y  Zp,  where  q  >  p  >  2  will  be  apparent  from  the  context,  as 

[x]p  =  [ ip/q)  ■  z]  mod  p,  (2.1) 

where  x  6  Z  is  any  integer  congruent  to  x  mod  q.  We  extend  |_-]p  component- wise  to  vectors  and  matrices 
over  Zg,  and  coefficient-wise  (with  respect  to  the  “power  basis”)  to  the  quotient  ring  Rq  defined  in  the  next 
subsection.  Note  that  we  can  use  any  other  common  rounding  method,  like  the  floor  |_- J ,  or  ceiling  [■] 
functions,  in  Equation  |2 . 1 1  above,  with  only  minor  changes  to  our  proofs.  In  implementations,  it  may  be 
advantageous  to  use  the  floor  function  |_-J  when  q  and  p  are  both  powers  of  some  common  base  b  (e.g.,  2).  In 
this  setting,  computing  [-Jp  is  equivalent  to  dropping  the  least-significant  digit(s)  in  base  b. 

2.1  Pseudorandom  Functions 

The  main  security  parameter  through  this  paper  is  n,  and  all  algorithms  (including  the  adversary)  are  implicitly 
given  the  security  parameter  n  in  unary.  We  write  negl(n)  to  denote  an  arbitrary  negligible  function  in  n,  one 
that  vanishes  faster  than  the  inverse  of  any  polynomial.  We  say  that  a  probability  is  overwhelming  if  it  is 
1  —  negl(n). 

We  consider  adversaries  interacting  as  part  of  probabilistic  experiments  called  games.  For  an  adversary 
A  and  two  games  Ho,  Hi  with  which  it  can  interact,  A’ s  distinguishing  advantage  (implicitly,  as  a  function 
of  n)  is  Advffn^^)  :=  |Pr[*4  accepts  in  Wo]  —  Pr[.4  accepts  in  Hi]\. 

Definition  2.1  (Computational  Indistinguishability).  We  say  that  games  Wo,  Hi  are  computationally  indistin- 

Q 

guishable,  written  Wo  ~  Hi,  if  Adv^^j  (A)  =  negl(n)  for  any  probabilistic  polynomial-time  A. 

Q 

By  the  triangle  inequality,  «  is  a  transitive  relation  over  any  poly (n) -length  sequence  of  games.  If 
Wo  ~  Wi  and  S  is  any  probabilistic  polynomial-time  algorithm,  then  the  outputs  of  S  playing  in  games  Ho 
and  Hi  (respectively)  are  also  computationally  indistinguishable. 

Definition  2.2  (Pseudorandom  functions).  Let  A  and  B  be  finite  sets,  and  let  T  =  { F, :  A  -X  B}  be  a 
function  family,  endowed  with  an  efficiently  sampleable  distribution  (more  precisely,  T ,  A  and  B  are  all 
indexed  by  the  security  parameter  n).  We  say  that  J7  is  a  pseudorandom  function  (PRF)  family  if  the  following 
two  games  are  computationally  indistinguishable: 
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1.  Choose  a  function  F  4—  T  and  give  the  adversary  adaptive  oracle  access  to  F(-). 

2.  Choose  a  uniformly  random  function  U :  A  — »  B  and  give  the  adversary  adaptive  oracle  access  to  U(-). 

To  efficiently  simulate  access  to  a  uniformly  random  function  U :  A  — »•  B,  one  may  think  of  a  process  in 
which  the  adversary’s  queries  are  “lazily”  answered  with  independently  and  randomly  chosen  elements  in  B, 
while  keeping  track  of  the  answers  so  that  queries  made  more  than  once  are  answered  consistently. 


2.2  (Ring)  Learning  With  Errors 


We  recall  the  learning  with  errors  (LWE)  problem  due  to  Regev  |Reg05]  and  its  ring  analogue  by  Lyuba- 


shevsky,  Peikert,  and  Regev  llLPR  10l.  For  positive  integer  dimension  n  (the  security  parameter)  and  modulus 
q  >  2,  a  probability  distribution  y  over  22  and  a  vector  s  E  Z",  define  the  LWE  distribution  ASiX  to  be 
the  distribution  over  Z™  x  7Lq  obtained  by  choosing  a  vector  a  Z”  uniformly  at  random,  an  error  term 
e  y,  and  outputting  (a,  b  =  (a,  s)  +  e  mod  q).  We  use  the  following  “normal  form”  of  the  decision- 
LWEn,q,x  problem,  which  is  to  distinguish  (with  advantage  non-negligible  in  n)  between  any  desired  number 
m  =  poly(n)  of  independent  samples  (a*,  b,, )  As:x  where  s  <—  xn  mod  q  is  chosen  from  the  (folded) 


error  distribution,  and  the  same  number  of  samples  from  the  uniform  distribution  f/(Z”  xZ?).  This  form  of 
the  problem  is  as  hard  as  the  one  where  s  E  Z™  is  chosen  uniformly  at  random  IACPS091. 

to  be  the  distribution 
~.t  j  b*  =  a4S  +  ef  mod  q).  By  a 


We  extend  the  LWE  distribution  to  w  >  1  secrets,  defining  AgiX  for  S  E  Z”X1 


obtained  by  choosing  a<-ZJ,  an  error  vector  e  E-  y,,;.  and  outputting 


X 


v)  from  uniformly  random  is  as  hard 


■n,q,x’  f°r  any  w  =  poly(n).  It  is  often  convenient  to  group  many  (say,  m )  sample  pairs 


standard  hybrid  argument,  distinguishing  such  samples  (for  S 
as  decision- LWE 

together  in  matrices.  This  allows  us  to  express  the  LWE  problem  as:  distinguish  any  desired  number  of  pairs 
(A4,  B*  =  A'S  +  E  mod  q)  E  Z"' x n  x  Z”tXU',  for  the  same  S,  from  uniformly  random. 

For  certain  moduli  q  and  (discrete)  Gaussian  error  distributions  %,  the  decision- LWE  problem  is  as  hard 
as  the  search  problem,  where  the  goal  is  to  find  s  given  samples  from  AS)X  (see,  e.g.,  |Reg05|lPei091IACPS09t 
flVIMlll.  and  UMP 1 1 H  for  the  mildest  known  requirements  on  q,  which  include  the  case  where  q  is  a  power 
of  2).  In  turn,  for  x  =  Dz,r  with  r  =  aq>  2 y/n,  the  search  problem  is  as  hard  as  approximating  worst-case 
lattice  problems  to  within  0(n/a )  factors;  see  [Reg05  !Pei09tt  for  precise  statements^ 


Ring-LWE.  For  simplicity  of  exposition,  we  use  the  following  special  case  of  the  ring- LWE  problem.  (Our 
results  can  be  extended  to  the  more  general  form  defined  in  ILPR10L)  Throughout  the  paper  we  let  R  denote 
the  cyclotomic  polynomial  ring  R  =  Z[z]/(zn  +  1)  for  n  a  power  of  2.  (Equivalently,  R  is  the  ring  of  integers 
Z[w]  for  Co'  =  exp(7rz/n).)  For  any  integer  modulus  q,  define  the  quotient  ring  Rq  =  R/qR.  An  element  of 
R  can  be  represented  as  a  polynomial  (in  z )  of  degree  less  than  n  having  integer  coefficients;  in  other  words, 
the  “power  basis”  {l,z, . . . ,  zn~1}  is  a  Z-basis  for  R.  Similarly,  it  is  a  Z^-basis  for  Rq. 

For  a  modulus  q,  a  probability  distribution  x  over  A,  and  an  element  s  E  Rq,  the  ring-LWE  (RLWE) 
distribution  AS)X  is  the  distribution  over  Rq  x  Rq  obtained  by  choosing  a  E  Rq  uniformly  at  random,  an  error 
term  x  <—  y,  and  outputting  (a.  b  =  a-s  +  x  mod  qR).  The  normal  form  of  the  decision-RLWE#  )(?iX  problem 
is  to  distinguish  (with  non-negligible  advantage)  between  any  desired  number  m  =  poly(n)  of  independent 
samples  (at,  bi)  •(—  AS)X  where  s  y  mod  q,  and  the  same  number  of  samples  drawn  from  the  uniform 

’It  is  important  to  note  that  the  original  hardness  result  of  |Reg05)  for  search- LWE  is  for  a  continuous  Gaussian  error  distribution, 
which  when  rounded  naively  to  the  nearest  integer  does  not  produce  a  true  discrete  Gaussian  Dz,r-  Fortunately,  a  suitable  randomized 
rounding  method  does  so  llPei  m 
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distribution  U(Rq  x  Rq).  We  will  use  the  error  distribution  x  over  R  where  each  coefficient  (with  respect  to 
the  power  basis)  is  chosen  independently  from  the  discrete  Gaussian  /A?,.  for  some  r  =  aq>  u(\Jn  log  n). 

For  a  prime  modulus  q  =  1  mod  2 n  and  the  error  distribution  x  described  above,  the  decision- RLWE 
problem  is  as  hard  as  the  search  problem,  via  a  reduction  that  runs  in  time  q  ■  poly(ra)  ILPR101.  In  turn,  the 
search  problem  is  as  hard  as  quantumly  approximating  worst-case  problems  on  ideal  lattices^] 

To  bound  products  of  samples  drawn  from  the  error  distribution  x  over  -R,  we  recall  a  useful  result 
from  liLPRTOl. 

Lemma  2.3.  Let  x  be  the  distribution  over  the  ring  R  where  each  coefficient  (with  respect  to  the  power  basis) 
is  chosen  independently  from  Dzr  for  some  r  >  0,  and  let  t  =  u;(\/log  n)  denote  any  function  that  grows 
asymptotically  faster  than  y/log  n.  Then  in  the  product  ofk  >  1  independen  t  samples  drawn  from  y,  every 
coefficient  is  bounded  in  magnitude  by  (ry/n  ■  t)k / y/n,  except  with  exp(— Lift2))  =  negl(n)  probability. 

2.3  Subgaussian  Distributions  and  Random  Matrices 

A  random  variable  X  over  M  (or  its  distribution)  with  E[X]  =  0  is  subgaussian  with  parameter  r  >  0 
if  it  has  Gaussian  tails,  i.e.,  for  all  t  >  0,  Pr[|X|  >  t]  <  2  exp(— 7r(f/r)2)|^]  In  particular,  Dj,,r  is 
subgaussian  with  parameter  r  IIBan95H.  Here  we  recall  a  useful  result  from  the  non-asymptotic  theory  of 
random  matrices  UVerllL  which  bounds  the  largest  singular  value  (sometimes  called  the  spectral  norm ) 
si(X)  :=  maxu^o||Xu||/||u||  of  a  matrix  with  independent  subgaussian  entries. 

Lemma  2.4.  Let  X  £  M" x  m  he  a  random  matrix  (or  vector )  whose  entries  are  drawn  independently  from 
(not  necessarily  identical)  subgaussian  distributions  with  common  parameter  r.  There  exists  a  universal 
constant  C  >  0  such  that  si(X)  <  r  •  C(y/m  +  y/n)  except  with  probability  at  most  2~^(m+n)_ 


3  The  Learning  With  Rounding  Problem 

We  now  define  the  “learning  with  rounding”  (LWR)  problem  and  its  ring  analogue,  which  are  like  “derandom- 
ized”  versions  of  the  usual  (ring)-LWE  problems,  in  that  the  error  terms  are  chosen  deterministically. 

Definition  3.1.  Let  n  >  1  be  the  main  security  parameter  and  moduli  q  >  p  >  2  be  integers. 

•  For  a  vector  s  e  Z”,  define  the  LWR  distribution  Ls  to  be  the  distribution  over  Z™  x  Z p  obtained  by 
choosing  a  vector  a  <—  Zq  uniformly  at  random,  and  outputting  (a,  b  =  [(a,  s)]p). 

•  For  s  G  Rq  (defined  in  Section [T2]),  define  the  ring-LWR  (RLWR)  distribution  Ls  to  be  the  distribution 
over  Rq  x  Rp  obtained  by  choosing  a  -c—  Rq  uniformly  at  random  and  outputting  (a,  b  =  (a  ■  s]p). 

For  a  given  distribution  over  s  €  (e.g.,  the  uniform  distribution),  the  decision- LWRn  gjp  problem  is 

to  distinguish  (with  advantage  non-negligible  in  n)  between  any  desired  number  of  independent  samples 
(a*,  bi)  Ls,  and  the  same  number  of  samples  drawn  uniformly  and  independently  from  Z™  x  Zp.  The 
decision-RLWR^q  p  problem  is  defined  analogously. 

4More  accurately,  to  prove  that  the  search  problem  is  hard  for  an  a  priori  unbounded  number  of  RLWE  samples,  the  worst-case 
connection  from  ILPR10II  requires  the  error  distribution’s  parameters  to  themselves  be  chosen  at  random  from  a  certain  distribution. 
Our  constructions  are  easily  modified  to  account  for  this  subtlety,  but  for  simplicity,  we  ignore  this  issue  and  assume  hardness  for  a 
fixed,  public  error  distribution. 

5This  simple  definition  will  suffice  for  our  purposes,  because  we  will  always  use  mean-zero  distributions.  For  a  more  general 
definition  that  applies  to  any  distribution,  see  IMP11I. 
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Note  that  we  have  defined  LWR  exclusively  as  a  decision  problem,  as  this  is  the  only  form  of  the  problem 
we  will  need.  By  a  simple  (and  by  now  standard)  hybrid  argument,  the  (ring-) LWR  problem  is  no  easier, 
up  to  a  poly(n)  factor  in  advantage,  if  we  reuse  each  public  a,  across  several  independent  secrets.  That  is, 
distinguishing  samples  (a*,  |_{a(.  si)]p, . . . ,  [{a,;,  s^)]p)  £  ZJ  x  Zj  from  uniform,  where  each  s j  e  Zq  is 
chosen  independently  for  any  £  =  poly(n),  is  at  least  as  hard  as  decision-LWR  for  a  single  secret  s.  An 
analogous  statement  also  holds  for  ring- LWR. 

3.1  Reduction  from  LWE 

We  now  show  that  for  appropriate  parameters,  decision-LWR  is  at  least  as  hard  as  decision-LWE.  We  say 
that  a  probability  distribution  y  over  M  (more  precisely,  a  family  of  distributions  yn  indexed  by  the  security 
parameter  n )  is  B-bounded  (where  B  =  B(n)  is  a  function  of  n)  if  Prx^_x[|a;|  >  B\  <  negl(n).  Similarly,  a 
distribution  over  the  ring  R  is  B-bounded  if  the  marginal  distribution  of  every  coefficient  (with  respect  to  the 
power  basis)  of  an  x  <—  x  is  B-bounded. 

Theorem  3.2.  Let  x  be  any  efficiently  sampleable  B-bounded  distribution  over  Z,  and  let  q>  p  ■  B  ■ 

Then  for  any  distribution  over  the  secret  s  E  Z™,  solving  decision- [SNRrhq  p  is  at  least  as  hard  as  solving 
decision-iSNEn^q)X  for  the  same  distribution  over  s.  The  same  holds  true  for  RLWR/-,>f;p  and  RLWE  R.q.x,  for 
any  B-bounded  y  over  R. 

We  note  that  although  our  proof  uses  a  super-polynomial  q  =  r>u{  1  f ,  as  long  as  q/p  >  ffin  is  an  integer, 
the  LWR  problem  appears  to  be  exponentially  hard  (in  to)  for  any  p  =  poly(n),  and  super-polynomially  hard 
for  p  <  2r'f  for  any  e  <  1,  given  the  state  of  the  art  in  noisy  learning  algorithms  I1BKW031IAG1 1 1  and  lattice 
reduction  algorithms  IILLL82irSch871.  We  also  note  that  in  our  proof,  we  do  not  require  the  error  terms  drawn 
from  x  in  the  LWE  samples  to  be  independent;  we  just  need  them  all  to  have  magnitude  bounded  by  B  with 
overwhelming  probability. 

Proof  of  Theorem  | A2]  We  give  a  detailed  proof  for  the  LWR  case;  the  one  for  RLWR  proceeds  essentially 
identically.  The  main  idea  behind  the  reduction  is  simple:  given  pairs  (a,,6j)  E  Z™  x  7Lq  which  are 
distributed  either  according  to  an  LWE  distribution  ASiX  or  are  uniformly  random,  we  translate  them  into 
the  pairs  (a*,  \bfp)  eZJx  Zp,  which  we  show  will  be  distributed  according  to  the  LWR  distribution  Ls 
(with  overwhelming  probability)  or  uniformly  random,  respectively.  Proving  this  formally  takes  some  care, 
however.  We  proceed  via  a  sequence  of  games. 

Game  Hq.  This  is  the  real  attack  game  against  the  LWR  distribution.  That  is,  we  choose  s  and  upon  request 
generate  and  give  the  attacker  independent  samples  from  Ls. 

Game  Hi.  Here  the  attack  is  against  a  ‘rounded’  version  of  the  LWE  distribution  ASiX.  That  is,  we  first 
choose  s.  Then  each  time  the  attacker  requests  a  sample,  we  generate  a  pair  (a,  b )  distributed  according  to 
ASjX  (that  is,  choose  a  -t—  Z”  and  b  =  (a.  s)  +  x  for  x  -t—  y),  and  return  the  pair  (a,  \ffip),  but  with  one 
exception:  we  define  a  ‘bad  event’  BAD  to  be 

BAD  :=  L&+[-B,B]jp^{L61p}. 

That  is,  BAD  indicates  whether  b  is  “too  close”  to  some  value  in  Zr/  having  a  different  rounded  value.  (In 
other  words,  rounding  the  sample  (a,  b )  from  ASjX  may  give  a  different  result  than  the  corresponding  sample 
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(a,  |_(a,  s)lP)  from  the  Ls  distribution.)  If  BAD  occurs  on  any  of  the  attacker’s  requests  for  a  sample,  we 
immediately  abort  the  game. 

If  BAD  does  not  occur  for  a  particular  sample  (a,  b),  then  we  have  \b~\p  :=  [(a,  s)  +  x]p  =  [(a,  s)~\p 
with  overwhelming  probability  over  the  choice  of  x  <—  x,  because  x  is  B-bounded.  It  immediately  follows 
that  for  any  (potentially  unbounded)  attacker  A, 

Adv^^^)  <  Pr [BAD  occurs  in  H\  with  attacker  A]  +  negl(n).  (3.1) 

We  do  not  directly  bound  the  probability  of  BAD  occurring  in  Hi,  instead  deferring  it  to  the  analysis  of  the 
next  game,  where  we  can  show  that  it  is  indeed  negligible. 

Game  H2.  Here  whenever  the  attacker  requests  a  sample,  we  choose  (a,  i)  £  Z"  x  7Lq  uniformly  at  random 
and  give  (a,  L&~|p)  to  the  attacker,  subject  to  the  same  “bad  event”  and  abort  condition  as  described  in  the 
game  H\  above.  Under  the  decision-LWE  assumption  and  by  the  fact  that  BAD  can  be  tested  efficiently 
given  b,  a  straightforward  reduction  implies  that  Adv j[2(A)  <  negl(n)  for  any  efficient  attacker  A.  For 
the  same  reason,  it  also  follows  that 

|Pr[BAD  occurs  in  H 1]  —  Pr[BAD  occurs  in  H2W  <  negl(n). 

Now  for  each  uniform  b,  Pr[BAD  occurs  on  b  in  H2]  <  (2 B  +  1)  ■  p/q  =  negl(n),  by  assumption  on  q.  It 
follows  by  a  union  bound  over  all  the  samples,  and  Equation  (|3.1[),  that 

Pr[BAD  occurs  in  H\  with  A]  <  negl(n)  =>■  AdvHo,Hl(A)  =  negl(n). 

Game  H3.  This  game  is  similar  to  the  game  //2,  with  pairs  (a,  b)  E  Z™  x  being  chosen  uniformly  at 
random  and  BAD  being  defined  similarly.  However,  in  this  game  we  always  return  (a,  |_6~|p)  to  the  attacker, 
even  when  BAD  occurs.  By  the  analysis  above,  we  have  that  for  any  (potentially  unbounded)  attacker  A, 

Ad vh2,h3(A)  <  Pr[BAD  occurs  in  H3  with  A]  =  Pr[BAD  occurs  in  H2  with  A]  =  negl(n). 

Game  H\.  In  this  game  we  give  the  attacker  samples  drawn  uniformly  from  Z”  x  Zp.  The  statistical 
distance  between  U( Z™  x  Zp)  and  U( Z")  x  [U(Zq)~\p  is  at  most  p/q  =  negl(n)  by  assumption  on  q,  so  by 
a  union  bound  over  all  the  poly(n)  samples,  we  have  Ad vh3,h4(A)  =  negl(n)  for  any  efficient  attacker  A. 

Finally,  by  the  triangle  inequality,  we  have  Adv//lhf/4  (A)  =  negl(n)  for  any  efficient  adversary  A, 
which  completes  the  proof.  Essentially  the  same  proof  works  for  the  RLWR  problem  as  well.  □ 

4  Synthesizer-Based  PRFs 

We  now  describe  the  LWR-based  synthesizer  and  our  construction  of  a  PRF  from  it.  We  first  define  a 
pseudorandom  synthesizer,  slightly  modified  from  the  definition  proposed  by  Naor  and  Reingold  HNR951. 

Let  S  :  A  x  A  B  be  a  function  (where  A  and  B  are  finite  domains,  which  along  with  S  are  implicitly 
indexed  by  the  security  parameter  n)  and  let  X  =  (aq, . . . , Xk)  £  Ak  and  Y  =  (yi,  ■ . .  ,ye)  £  Ae  be  two 
sequences  of  inputs.  Then  Cs(X,  Y)  £  Bkxt  is  defined  to  be  the  matrix  with  S y:j )  as  its  (i,j) th  entry. 
(Here  C  stands  for  combinations.) 
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Definition  4.1  (Pseudorandom  Synthesizer).  We  say  that  a  function  S  :  A  x  A  ^  B  is  a  pseudorandom 
synthesizer  if  it  is  polynomial-time  computable,  and  if  for  every  poly(n)-bounded  k  =  k(n),  i  =  tin), 

C s{U(Ak),  U(A1))  w  U(Bkxi). 

That  is,  the  matrix  Cs(X,  Y)  for  uniform  and  independent  X  -t—  Ak,  Y  <—  A1  is  computationally  indistin¬ 
guishable  from  a  uniformly  random  k- by-f  matrix  over  B. 


4.1  Synthesizer  Constructions 

We  now  describe  synthesizers  whose  security  is  based  on  the  (ring-)LWR  problem. 

Definition  4.2  ((Ring-)LWR  Synthesizer).  For  moduli  q  >  p  >  2,  the  LWR  synthesizer  is  the  function 

Sntq:P :  Z”  x  Z™  — »  Zp  defined  as 

Sn,q,P(x,  y)  =  L(x,y)V 

The  RLWR  synthesizer  is  the  function  Sji.q.p :  Rq  x  Rq  ^  Rp  defined  as 

SR,q,p(x,y)  =  Lx  •  y\p. 

Theorem  4.3.  Assuming  the  hardness  of  decision-\}NR,l  qq)  (respectively,  decision- R \X\JRfiqp)  for  a  uni¬ 
formly  random  secret,  the  function  SnJhp  (respectively,  Snqp)  given  in  Definition  \4.2\ahove  is  a  pseudoran¬ 
dom  synthesizer. 

It  follows  generically  from  this  theorem  that  the  function  Tn/pp :  Z”xn  x  Z”xn  — >  Z™xn,  defined  as 
Tn.q.p(X,  Y)  =  X  ■  Y]p,  is  also  a  pseudorandom  synthesizer,  since  by  the  definition  of  matrix  multiplica¬ 
tion,  we  only  incur  a  factor  of  n  increase  in  the  length  of  the  input  sequences.  This  is  the  synthesizer  that  we 
use  below  in  the  construction  of  a  PRF. 


Proof  of  Theorem  pO]  Let  (,  k  =  poly(n)  be  arbitrary.  Let  X  =  (xi, . . . ,  x/,..)  and  Y  =  (yi, . . . ,  yf)  be 
uniformly  random  and  independent  sequences  of  Z” -vectors.  Assuming  the  hardness  of  “multiple  secrets” 
version  of  decision- LWR, (see  the  remark  following  Definition |3TTj),  we  have  that  the  tuples 

(xi,  L(xi,yi)lP,...,  L(xj,y^}lp)  £  Z"  x  zj 

for  i  =  1, . . . ,  k  are  computationally  indistinguishable  from  uniform  and  independent.  That  is, 

((xik[fc],Cs(X,Y))  «  U(  1nxk  x  Zkxt). 


From  this  stronger  fact,  we  have  that  C s(X,  Y)  ~  U(7jpx£),  as  desired.  Essentially  the  same  proof  works 
for  the  RLWR  synthesizer  as  well.  □ 


4.2  The  PRF  Construction 

Definition  4.4  ((Ring-) LWR  PRF).  For  parameters  n  E  N,  input  length  k  =  2d  >  1,  and  moduli  q,i  > 
Qd-i  >  •  •  •  >  qo  >  2,  the  LWR  family  (pd)  for  0  <  j  <  d  is  defined  inductively  to  consist  of  functions  from 
{0,  l}2'  to  Z"x_”-  We  define  T  =  Pd\ 

•  For  j  =  0,  a  function  F  G  is  indexed  by  £  %qxn  for  b  G  {0, 1},  and  is  defined  simply  as 

F|Sfc}(x)  =  Sx.  We  endow  with  the  distribution  where  the  S/,  are  uniform  and  independent. 
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•  For  j  >  1,  a  function  F  E  F^  is  indexed  by  some  Fq,  F\  E  F<J  1  ) ,  and  is  defined  as 

Ff0,Fi(xo,xi)  =  T^(F0(x0) ,  Fi(xi)) 

where  |xo|  =  | rci |  =  2-7-1  and  =  Tnm_j+1^qd_  is  the  appropriate  synthesizer.  We  endow  F^ 
with  the  distribution  where  Fq  and  F\  are  chosen  independently  from  Ftj~ 1 

More  explicitly,  F  E  F  is  indexed  by  a  set  of  matrices  { S,  ;, }  (where  i  E  [/;:],  b  E  {0, 1}),  and  for  input 
x  =  x\  ■  ■  ■  Xk  is  defined  as 


'{Si, 


TOO  = 


Ls 


l,Xi  '  ^ 2,312 


Q  d 


_•  LS3. 


54,a;4 


Id- 1 


Id-2 


[Sfc_ 


)k,Xk 


Id- 1 


<?0 


The  ring-LWR  family  1ZF1-11  is  defined  similarly  to  consist  of  functions  from  {0, 1}2J  to  Rq d_  ,  where 
in  the  base  case  ( j  =  0)  we  replace  each  S/,  with  a  uniformly  random  $t,  E  Rqd,  and  in  the  inductive  case 
(j  >  1)  we  use  the  ring-LWR  synthesizer  S(j)  =  SRm_j+im_y 


Figure  1:  The  synthesizer-based  PRF  evaluated  on  the  input  0111 ...  0 

We  remark  that  the  recursive  LWR-based  construction  above  does  not  have  to  use  square  matrices;  any 
legal  dimensions  would  be  acceptable  with  no  essential  change  to  the  security  proof.  Square  matrices  appear 
to  give  the  best  combination  of  seed  size,  computational  efficiency,  and  input/output  lengths. 


4.3  Efficiency 


Consider  a  function  in  either  one  of  the  families  F  or  1ZF  from  Definition  4.4  Computing  the  function  at 
any  given  point  x  E  {0,  1 } can  be  done  in  a  tree-like  fashion  using  a  tree  of  depth  d  =  lg  k,  where  each 
node  of  the  tree  corresponds  to  an  evaluation  of  an  appropriate  synthesizer.  Each  synthesizer  involves  a  single 
matrix  (or  ring)  product  mod  Qd-j+\,  followed  by  a  rounding  step.  Here  we  discuss  implementations  of  the 
synthesizers,  describing  both  the  simplest  practical  methods  along  with  depth-optimized  parallel  solutions 
(which  rely  on  preprocessing  and  use  larger  circuits).  In  summary,  the  synthesizers  can  be  computed  by 
small,  low-depth  arithmetic  circuits;  moreover,  in  principle  they  can  be  implemented  in  TC°,  the  class  of 
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constant-depth,  poly(n)-sized  circuits  with  unbounded  fan-in  and  threshold  gates  (which  is  a  subset  of  NC1). 
Therefore,  the  PRFs  can  be  implemented  in  TC1,  which  matches  the  best  constructions  from  ||NR95t. 

In  a  practical  sequential  implementation,  we  can  use  any  fast  matrix  multiplication  algorithm  (e.g., 
Strassen’s),  and  we  can  multiply  ring  elements  (in  the  standard  power  basis)  in  0(n  log  n)  scalar  operations 
mod  q  (see,  e.g.,  HLPR 1 01).  In  a  practical  parallel  implementation,  we  can  compute  a  matrix  multiplication  in 
the  natural  way  using  a  size-0(n2),  depth-2  arithmetic  circuit  over  Zg,  where  the  first  layer  of  multiplication 
gates  have  fan-in  2  and  the  second  layer  of  addition  gates  has  fan-in  n.  The  same  is  true  for  a  product  of  ring 
elements  in  Rq,  since  it  can  be  expressed  as  a  matrix-vector  product:  multiplication  by  any  fixed  element 
a  G  Rq  is  a  linear  transformation. 

For  computing  a  synthesizer  TnqjP  in  TC°,  we  note  that  a  matrix  product  consists  of  n2  parallel  inner 
products  of  n-dimensional  vectors,  which  each  involve  a  multi-sum  of  (binary)  scalar  products  modulo  q. 
The  subsequent  rounding  step  simply  amounts  to  dropping  some  of  the  least-significant  digits  if  q  and  p  are 
both  powers  of  the  same  small  base,  or  more  generally,  multiplying  by  p/q  (under  suitable  precision)  and 
truncating.  Both  operations  can  be  performed  in  TC°,  for  any  q  =  2poly'T9  IRT921. 

Interestingly,  once  we  allow  for  threshold  gates,  there  seems  to  be  no  asymptotic  improvement  in  depth 
for  the  ring-based  synthesizer  Sr^)P.  This  is  because  threshold  circuits  enable  binary  matrix  product  to  be 
computed  in  constant  depth,  and  the  depth  of  computing  the  PRF  is  anyway  dominated  by  d,  the  depth  of  the 
tree.  The  gains  in  efficiency  obtained  by  using  a  ring-based  construction  will  be  much  more  pronounced  in 
the  case  of  the  degree-/,:  synthesizers  described  in  Section[5]  We  discuss  these  gains  in  detail  in  Section [572) 

We  remark  that  Naor  and  Reingold  IINR95I  describe  several  nice  optimizations  and  additional  features  of 
their  synthesizer-based  PRFs,  including  compression  of  the  secret  key  and  faster  amortized  computation  for  a 
sequence  of  related  inputs.  Our  functions  are  amenable  to  all  these  techniques  as  well. 

4.4  Security  Proof 

The  security  proof  for  our  PRF  hinges  on  the  fact  that  the  functions  T(j]  =  Tn^qd_j+ltqd_.  are  synthesizers  for 
appropriate  choices  of  the  moduli.  In  fact,  the  proof  is  essentially  identical  to  Naor  and  Reingold’s  fNR95i 
for  their  PRF  construction  from  pseudorandom  synthesizers;  the  only  reason  we  cannot  use  their  theorem 
exactly  as  stated  is  because  they  assume  that  the  synthesizer  output  is  exactly  the  same  size  as  its  two  inputs, 
which  is  not  quite  the  case  with  our  synthesizer  due  to  the  modulus  reduction.  This  is  a  minor  detail  that  does 
not  change  the  proof  in  any  material  way;  it  only  limits  the  number  of  times  we  may  compose  the  synthesizer, 
and  hence  the  input  length  of  the  PRF. 

Theorem  4.5.  Assuming  that  =  Tn:qd_j+1)qd_.  is  a  pseudorandom  synthesizer  for  every  j  G  [d]  (in 
particular,  assuming  the  hardness  of  decision-  \ANRnqd_.+x  qd_. ),  the  LWR  family  J-  from  Definition  1 4. 4\  is  a 
pseudorandom  function  family. 

The  same  holds  for  the  ring-VSNR  family  1ZT,  assuming  that  S(j)  =  Sji.qd_.+i  ,qd_:j  is  a  pseudorandom 
synthesizer  for  every  j  G  [d]  (in  particular,  assuming  the  hardness  of  decision-RLVdR  [{qd  J+Uqd_  J- 

Proof  We  give  a  detailed  proof  for  the  family  T ;  the  one  for  TIT  proceeds  essentially  identically.  We  prove 
that  each  T^>  is  a  pseudorandom  function  family  by  induction  for  j  =  0, . . . ,  d.  The  case  j  =  0  is  trivial  by 
construction  of  T0) .  Assuming  the  inductive  hypothesis  on  Tij  1  j  for  some  j  >  1,  we  prove  the  claim  for 
T-d  via  the  following  series  of  games. 

Game  Hq.  This  is  the  PRF  attack  game  against  T(j> ;  we  choose  an  F  <—  T^\  i.e.,  choose  Fq,  F\  4— 
jrt?-1)  independently,  and  give  the  attacker  oracle  access  to  Ff0,fx  (xo>  xi)  =  (Fo(xo),  F\ (aq)),  where 
as  always  |xq|  =  |aq|  =  2J_1. 
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Game  H\.  We  replace  Fq ,  F\  above  with  truly  unifonn  functions.  Specifically,  we  (lazily)  choose  two 
uniform  and  independent  functions  Uq,  U\  :  {0, 1}2J  -»  Zg*_™+1.  On  each  query  x  =  (xq,  x \ )  G  {0, 1  }2J , 

we  return  T^\Uq(xq),  U\  {x\  )).  By  a  trivial  reduction  using  the  inductive  hypothesis  that  is  a 

PRF  family  (so  Fq,  F\  are  computationally  indistinguishable  from  Uq,  U\  given  query  access),  this  game  is 
computationally  indistinguishable  from  Hq. 

Game  H2.  We  give  the  attacker  oracle  access  to  a  (lazily  defined)  uniform  function  U  :  {0,  1  }2'  -X  Z 

We  claim  that  games  Ho  and  II\  are  computationally  indistinguishable,  because  T-k)  is  a  synthesizer 
by  hypothesis.  Suppose  that  an  efficient  adversary  A  makes  at  most  Q  =  poly(n)  total  queries.  We  design 
an  efficient  simulator  S  which,  given  input  (Zjj  )ije[Q]  G  (Z?dx™)QxQ  where  either  Zid  =  Yj) 

for  some  uniformly  random  and  independent  X;.  Y;;  G  for  i .  j  G  [Q],  or  each  Zzj  is  uniformly 

random  and  independent,  simulates  game  H 1  or  H2,  respectively.  Because  the  two  types  of  inputs  to  S  are 
computationally  indistinguishable  by  assumption  on  T{/F  (and  S  is  efficient),  it  follows  that  games  H \  and 
H2  are  indistinguishable  as  well. 

S  works  as  follows:  starting  from  i  =  j  =  1,  on  each  query  x  =  (xo,  x ] )  G  {0,  1 } 2:1  from  A,  look  up 
whether  .xq  (respectively,  x  1 )  is  already  associated  with  an  index  1  (resp.,  j);  if  not,  associate  it  with  the 
current  value  of  i  (resp.,  j )  and  increment  that  variable.  Return  the  associated  matrix  Zjj  to  A.  It  is  clear  by 
inspection  that  the  behavior  of  S  is  as  claimed  above. 

We  conclude  that  game  Ho  is  computationally  indistinguishable  from  game  H2,  i.e.,  that  F-J>  is  a 
pseudorandom  function  family,  as  desired.  □ 


5  Direct  PRF  Constructions 

Here  we  present  another,  potentially  more  efficient  construction  of  a  pseudorandom  function  family  whose 
security  is  based  on  the  intractibility  of  the  LWE  problem. 


5.1  Constructions 


Definition  5.1  ((Ring-)LWE  degree-/;:  PRF).  For  parameters  n  G  N,  moduli  q  >  p  >  2,  positive  integer 
m  =  poly(n),  and  input  length  k  >  1,  the  family  T  consists  of  functions  from  {0, 1 } k  to  Z"'xn.  A  function 
F  G  T  is  indexed  by  some  A  G  Z” xm  and  Sj  G  Znxn  for  each  i  G  [k],  and  is  defined  as 


F{x)  =  FA{s.}(x  i---xk) 


n® 

i=  1 


(5.1) 


We  endow  T  with  the  distribution  where  A  is  chosen  uniformly  at  random,  and  below  we  consider  a  number 
of  natural  distributions  for  the  S,. 

The  ring-based  family  FIT  is  defined  similarly  to  consist  of  functions  from  {0, 1  }k  to  Rp,  where  we 
replace  A  with  uniformly  random  aGfl,  and  each  S,  with  some  s,  G  R. 


5.2  Efficiency 

Consider  a  function  F  G  T  as  in  Definition  HD  Computing  the  function  involves  a  subset-product  of 
matrices.  Generally  speaking,  matrix  multi-product  does  not  appear  to  be  computable  in  TC°  (if  it  were,  then 
TC°  would  equal  NC1  IIMP00I  ).  However,  in  our  case  the  matrices  are  known  in  advance  (the  variable  input 
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is  the  subset,  so  it  may  be  possible  to  reduce  the  depth  of  the  computation  via  preprocessing,  using  ideas 
from  JRT92).  As  described  in  Section  4.3  both  binary  matrix  product  and  rounding  can  be  implemented  with 
simple  depth-2  arithmetic  circuits,  and  hence  in  TC°,  so  at  worst  F  can  be  computed  in  TC1  by  computing 
the  subset  product  in  a  tree-like  fashion,  followed  by  a  final  rounding  step. 

The  ring  variant  of  Construction  5.1  appears  to  be  more  efficient  to  evaluate,  both  in  practice  and  in 
terms  of  the  best  theoretical  depth.  Consider  a  function  F  <G  7 ZT  as  in  Definition  |5. 1|  As  is  standard 
with  ring-based  primitives  (see,  e.g.,  BLMPR08 [ iLPR  1  Oil) .  one  could  store  the  ring  elements  a,  si, . . . ,  Sk  as 
vectors  in  Z”  using  the  discrete  Fourier  transform  or  “Chinese  remainder”  representation  modulo  q  (that  is,  by 
evaluating  a  and  the  Si  as  polynomials  at  the  n  roots  of  zn  +  1  modulo  q),  so  that  multiplication  of  two  ring 
elements  just  corresponds  to  a  coordinate-wise  product  of  their  vectors.  Then  to  evaluate  the  function,  one 
would  just  compute  a  subset-product  of  the  appropriate  vectors,  then  interpolate  the  result  to  the  power-basis 
representation,  using  essentially  an  n-dimensional  Fast  Fourier  Transform  over  7Lq,  in  order  to  perform 
the  rounding  operation.  For  the  interesting  case  of  k  =  cu(logn),  the  sequential  runtime  of  this  method  is 
dominated  by  the  kn  scalar  multiplications  in  Zg  to  compute  the  subset-product;  in  parallel,  the  arithmetic 
depth  (over  Z9)  is  0(log(nk’)).  Alternatively,  the  subset-product  part  of  the  function  might  be  computed  even 
faster  by  storing  the  discrete  logs,  with  respect  to  some  arbitrary  generator  g  of  Z*,  of  the  Fourier  coefficients 
of  a  and  syJ^The  subset-product  then  becomes  a  subset-sum,  followed  by  exponentiation  modulo  q,  or  even 
just  a  table  lookup  if  q  is  relatively  small.  Assuming  that  additions  mod  q  —  1  are  significantly  less  expensive 
than  multiplications  mod  q,  the  sequential  runtime  of  this  method  is  dominated  by  the  0(n  log  n)  scalar 
operations  in  the  FFT,  and  the  parallel  arithmetic  depth  is  again  0(log  n). 

In  terms  of  theoretical  depth,  the  multi-product  of  vectors  can  be  performed  in  TC°,  as  can  the  Fast  Fourier 
Transform  and  rounding  steps  BRT92I.  This  implies  that  the  entire  function  can  be  computed  in  TC°,  matching 
(asymptotically)  the  shallowest  known  PRFs  based  on  the  DDH  and  factoring  problems  [|NR971lNRR00t. 


5.3  Security  Proof  Under  LWE 

Our  first  theorem  says  that  when  the  entries  of  the  S?;  are  “small,”  i.e.,  chosen  from  a  suitable  LWE  error 
distribution,  the  dcgrcc-A:  construction  is  a  PRF  under  a  suitable  LWE  assumption. 

Theorem  5.2.  Let  x  =  F>%  rf°r  some  r  >  0,  and  let  q  >  p  ■  k(Cr  \/n)k  ■  rF- 1 !  for  a  suitable  universal  con¬ 
stant  C.  Endow  the  family  T  from  Definition  \5.2\ with  the  distribution  where  each  S,  is  drawn  independently 
from  Xnxn.  Then  assuming  the  hardness  of  decision  -  LW  E  n  Jhx,  the  family  T  is  pseudorandom. 

An  analogous  theorem  holds  for  the  ring-based  family  TIT,  under  decision-RLWE. 

Theorem  5.3.  Let  x  be  the  distribution  over  the  ring  R  where  each  coefficient  (with  respect  to  the  power 
basis)  is  chosen  independently  from  Dz,r  for  some  r  >  0,  and  let  q  >  p  •  k(ry/n  ■  oj(\/Io"  n))k  ■ 
Endow  the  family  1ZT  from  Definition  |5. 2 1  with  the  distribution  where  each  Si  is  drawn  independently  from  X- 
Then  assuming  the  hardness  of  decision-R  LWE,,  ,hx,  the  family  1ZT  is  pseudorandom. 

We  first  prove  Theorem |5.2| for  the  standard  LWE  construction. 

Proof  of  Theorem  [372]  To  aid  the  proof,  it  helps  to  define  a  family  Q  of  functions  G:  {0,  l}fc  — >  Zqxn,  which 
are  simply  the  unrounded  counterparts  of  the  functions  in  T .  That  is,  for  A  e  Z"  x  r"  and  S,  e  Znxn  for 
i  G  [A;],  we  define  GA  ts. y(xi  ■  ■  ■  xfi)  :=  A*  •  nf=i  S*\  We  endow  Q  with  the  same  distribution  over  A  and 
the  Si  as  T  has. 

6If  necessary,  one  would  also  store  binary  mask  vectors  indicating  which  Fourier  coefficients  are  zero,  and  hence  not  in  Z*. 
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We  proceed  via  a  sequence  of  games,  much  like  in  the  proof  of  Theorem  3.2 

>fctoZ™xn 


First  as  a  “thought 

experiment”  we  define  a  new  family  Q  of  functions  from  {0,  l}fe  to  Z"'xr?.  This  family  is  a  counterpart  to  Q, 
but  with  two  important  differences:  it  is  a  PRF  family  without  any  rounding  (and  hence,  with  rounding  as 
well),  but  each  function  in  the  family  has  an  exponentially  large  key.  Alternatively,  one  may  think  of  the 
functions  in  Q  as  randomized  functions  with  small  keys.  Then  we  show  that  with  overwhelming  probability, 
the  rounding  of  G  4—  Q  agrees  with  the  rounding  of  the  corresponding  G  G  Q  on  all  the  attacker’s  queries, 
because  the  outputs  of  the  two  functions  are  relatively  close.  It  follows  that  the  rounding  of  G  4—  Q  (i.e., 
F  4—  F)  cannot  be  distinguished  from  a  uniformly  random  function,  as  desired. 

More  formally,  we  define  the  following  games: 


Game  Hq.  This  is  the  real  PRF  attack  game  against  the  family  F:  we  choose  an  F  <—  F  (so  F(-)  =  [G(-)]p 
for  G  4—  Q),  and  the  attacker  has  oracle  access  to  F(-). 


5.4 


below.  The  choice 


Game  H\ .  Here  we  instead  choose  G  4—  Q,  where  the  family  Q  is  given  in  Definition  : 
of  G  induces  a  corresponding  G  G  Q  having  the  same  distribution  as  in  Hq.  (This  is  simply  because  the  key 
of  G  is  just  a  portion  of  the  key  of  G.)  To  be  precise,  we  choose  G  “lazily”  as  the  attacker  makes  queries, 
because  the  description  of  G  has  exponential  size;  see  the  remarks  following  Definition 


5.4 


for  details. 


The  attacker  has  oracle  access  to  LG(-)~|P,  but  with  one  exception:  on  query  x,  define  the  “bad  event” 


BAD  ,,  for  that  query  to  be 


G(x)  +  [-B,B]r 


+  {LG'0)1:P}, 


where  B  =  k(Cr^/n)k.  That  is,  BADX  indicates  whether  any  entry  of  G(x)  G  Z™xn  is  “too  close”  to 
another  element  of  Z(/  that  rounds  to  a  different  value  in  Zp.  Note  that  a  y  G  Zq  is  “too  close”  in  this  sense  if 
and  only  if  |_(y  —  B)  ■  A  [(y  +  B)  ■  2]  gZ,  where  y  G  Z  is  any  integer  congruent  to  y  mod  q,  so  BAD,,. 


can  be  efficiently  detected  given  only  the  value  of  G{x 
the  game  immediately  aborts. 

In  Lemma 


If  BAD,,,  occurs  any  of  the  attacker’s  queries,  then 


5.5  below,  we  show  that  for  every  fixed  x  G  {0, 1}  ,  with  overwhelming  probability  over  the 


choice  of  G  4 —  Q  and  the  induced  G  G  G,  it  is  the  case  that  G(x)  G  G(x)  +  [-B,  B]mxn  mod  q.  Hence 
|_G(.t)~|p  =  \G(x)]p  so  long  as  BAD,,  does  not  occur,  and  the  attacker’s  queries  are  answered  exactly  as  they 
are  in  Hq,  subject  to  the  game  not  aborting.  It  follows  that  for  any  (potentially  unbounded)  attacker  A, 


Adv/fg^,  (A)  <  Pr[some  BAD^  occurs  in  H\  with  attacker  A]  +  negl(n).  (5.2) 

We  do  not  directly  bound  the  probability  that  some  BAD  ,,  occurs  in  H \ .  but  instead  defer  to  the  analysis  of 
the  next  game,  where  we  can  show  that  it  is  indeed  negligible. 


Game  H-2.  Here  we  choose  U  to  be  a  uniformly  random  function  from  {0, 1 } k  to  Z"'x"  (defined  “lazily” 
as  the  attacker  makes  queries).  The  attacker  has  oracle  access  to  with  the  same  “bad  event”  and 

abort  condition  as  in  Hi,  but  defined  relative  to  U  instead  of  G. 

In  Theorem|5.6|below,  we  show  that  under  the  LWE  assumption  from  the  theorem  statement,  no  efficient 
adversary  can  distinguish  (given  oracle  access)  between  G  4—  Q  and  a  uniformly  random  function  U .  Because 
the  BAD„:  event  in  II \  (respectively,  H2)  for  a  query  x  can  be  tested  efficiently  given  query  access  to  G 
(resp.,  U),  a  trivial  simulation  implies  that  for  any  efficient  attacker  A ,  we  have  Adv#li#2(.A)  <  negl(n). 
For  the  same  reasons,  it  also  follows  by  a  straightforward  simulation  that  for  any  efficient  attacker  A, 

|Pr[some  BAD.,,  occurs  in  H\  with  A]  —  Pr[some  BADa.  occurs  in  H2  with  A]  <  negl(n). 
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In  Ho,  because  U  is  a  uniformly  random  function,  for  any  particular  query  x  the  probability  that  BAD.,, 
occurs  is  bounded  by  (2 B  +  1)  •  p/q  =  negl(n),  by  assumption  on  q.  By  a  union  bound  over  all  poly(n) 
queries  of  an  efficient  A,  and  then  applying  Equation  (|5.2|).  we  therefore  have  that 

Prfsome  BADX  occurs  in  H\  with  A]  =  negl(n)  =>■  (A)  =  negl(n). 


Game  //>  Here  we  still  choose  a  uniformly  random  function  U  and  give  the  attacker  oracle  access  to 
[U(-)]p.  For  each  query  x  we  define  the  event  BAD.,,  as  in  game  Ho,  but  still  answer  the  query  and  continue 
with  the  game  even  if  BAD,,  occurs.  From  the  above  analysis  of  //2  it  follows  that  for  any  (potentially 
unbounded)  attacker  A  making  poly(n)  queries,  we  have 


Adv h2,h3{A)  <  Prfsome  BAD^  occurs  in  H2  with  A]  =  negl(n). 


Finally,  observe  that  [17 (-)lp  is  a  truly  random  function  from  {0,  1  }k  to  Z™xn,  up  to  the  bias  involved  in 
rounding  the  uniform  distribution  on  7Lq  to  Zp.  Because  q  >  p  ■  noj(  1 ) ,  this  bias  is  negligible  (and  there  is  no 
bias  if  p  divides  q). 

By  the  triangle  inequality,  it  follows  that  for  any  efficient  A,  we  have  Ad vh0,h3  (A)  =  negl(n),  and  this 
completes  the  proof.  □ 


We  now  define  the  family  Q  used  in  the  proof  of  Theorem  |5.2[ 


5.1 


the  family  Cfl) 


ymxn.  „ro  define  g  =  g(k) _ 


Definition  5.4.  For  parameters  n,  q,  m,  k  and  error  distribution  x  (over  Z)  as  in  Defin  ition |: 
for  0  <  %  <  k  is  defined  inductively  to  consist  of  functions  from  {0, 1}*  to  Z",xn;  we  < 

•  For  i  =  0,  a  function  G  £  G^  is  indexed  by  some  A  £  Z™xn,  and  is  dehned  simply  as  Ga(e)  =  A*. 
We  endow  G^  with  the  distribution  where  A  is  chosen  uniformly  at  random. 


•  For  *  >  1,  a  function  G  £  Gl',)  is  indexed  by  some  G'  £  Gr  ~  plus  an  S,  £  Znxn  and  error 
matrices  E xi  £  Zmxn  for  each  x'  £  {0,  l}*^1  (where  {0, 1}°  is  the  singleton  set  {e}).  For  x  = 
(x1,  Xi)  £  {0, 1}*  where  \x'\  =  i  —  1,  the  function  is  defined  as 


G{x)  =  Gg,  ^^(x' ,Xi)  :=  G'(x')  ■  Sfi  +  Xi  ■  Ez/  mod  q. 

We  endow  GG)  with  the  distribution  where  G'  £-  Q and  all  the  entries  of  S,;  and  every  E xi  are 
chosen  independently  from  x- 


Note  that  a  function  G  £  Q  is  fully  specified  by  A,  { S7; ,  an<J  exponentially  (in  k)  many  eiTor 
matrices  E for  all  x  £  {0,  1  \f:  and  i  £[k\,  these  error  matrices  are  what  prevents  G  itself  from  being 
used  as  a  PRF  family.  However,  as  needed  in  the  proof  of  Theorem|5.2|(game  Hi),  the  error  matrices  can  be 
chosen  “lazily,”  since  the  value  of  G(x)  depends  only  on  A,  {S*},  and  Exl...Xj_1  for  i  £  [k].  For  a  function 
G  =  G'a.{s,}.{e  ,}  ^  Q,  we  define  its  induced  function  in  the  family  G  to  be  G  =  Ga.{s,}-  Note  that  for 
G  4—  G,  the  induced  function  G  has  the  same  marginal  distribution  as  if  it  had  been  chosen  from  G  directly. 

The  following  lemma  is  used  in  the  analysis  of  game  H\. 


Lemma  5.5.  Let  x  £  {0, 1 } k  be  arbitrary.  Then  except  with  2  A")  probability  over  the  choice  of  G  = 
^A.fSilTE  ,}  <—  G  and  its  induced  function  G  =  G'a.isT  ^  <x>’  we  ^iave 


G(x)  £  G(x)  +  [ -B ,  B]mxn  mod  q 


for  some  B  =  k  ■  (Crfn)k,  where  C  is  a  universal  constant. 
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Proof.  Observe  that 


G(x i  •  •  •  xk)  =  (•  •  •  (( A *  •  S^'1  +  xi  ■  Ee)  •  S22  +  x2  •  EXl)  •  •  • )  •  S xkk  +  xk  •  EaJl...Xfc_1  mod  q 

k  k  k 

= a*  ■  n  sr  + xi  ■  ■  n  sr + x2  ■  e^i  ■  n ^ ^  ^  ■  E®i-xfc_i  mod  q. 

2=1  2=2  2=3 

V - v - ' 

G{x) 


Now  by  Lemma  2.4  except  with  probability  2  for  every  i  G  [A:]  we  have  si(Sj)  <  (){rfTi)  and 
| e 1 1  <  ()(r\Jn)  for  every  row  e  of  the  error  matrices  EXl...Xi_1.  Therefore,  each  row  of  the  k  cumulative 

T k  axj  it _ n.1\  1 _ _ i _ *u  _ _ <-  rz:\ k 


error  matrices  E 


n 


■j=i+ 1 


(for  i  G  [A;])  has  Euclidean  length  at  most  0(r^/n)k,  and  so  its  entries 


are  bounded  by  the  same  quantity  in  magnitude.  The  claim  follows. 


□ 


Theorem  5.6.  Under  the  LWE  assumption  from  the  statement  of  Theorem  5.2  the  family  Q  of  functions  from 
{0,  l}fc  to  Z™xn  is  pseudorandom. 

In  the  proof  we  will  need  the  following  intermediate  function  families. 


5.1 


Definition  5.7.  For  n,  q,  m,  and  \  as  in  Definition 
functions  from  {0, 1}*  to  Z"'*'1.  A  function  H  from  the  family  is  indexed  by  some  S;  e  Z' 
A xt  G  Z”xm,  Et/  g  Zmxn  for  each  x'  G  {0, 1}*_1  (where  {0, 1}°  =  {e}).  It  is  defined  as 


and  an  integer  i  >  1,  the  family  w-1’  consists  of 

nxn  and  matrices 


H{x)  =  HSiAKl}Avx,}{x' ,xf)  :=  A*,  •  Sf'  +  xt  ■  Ex,  mod  q, 


where  \x'\  =  i  —  1.  We  endow  TT  with  the  distribution  where  each  Ax>  is  uniformly  random  and  independent, 
and  all  the  entries  of  S,  and  E xi  are  chosen  independently  from  %.  We  remark  that  an  H  <—  'Hy>  can  be 
chosen  “lazily”  in  the  natural  way. 

Proof  of  Theorem  |5. 6\  We  prove  that  each  family  is  pseudorandom  by  induction  on  i,  from  0  to  k.  The 
base  case  of  i  =  0  is  trivial  by  construction.  For  i  >  1,  we  prove  the  claim  by  the  following  series  of  games. 


Game  Hq.  We  (lazily)  choose  a  G  and  give  the  attacker  oracle  access  to  C(-). 

Game  H\.  We  (lazily)  choose  an  H  -c—  'Hl,)  (defined  above)  and  give  the  attacker  oracle  access  to  //(•)■ 
We  claim  that  Hq  ~  H\  under  the  inductive  hypothesis  that  is  a  PRF  family.  To  prove  this,  we 

design  an  efficient  simulator  S  that  is  given  oracle  access  to  a  function  F :  {0,  l}*-1  — >  Z"'xr',  where  F  is 
either  &  4-  or  a  uniformly  random  function,  and  S  emulates  either  game  Hq  or  If  (respectively)  to 

an  attacker.  The  simulator  S  first  chooses  an  S i  <—  xnx  nI  and  on  each  query  x  =  (x'.:  xr )  from  the  attacker 
where  \x'\  =  i  —  1,  S  queries  its  oracle  to  get  A*,  =  F(x'),  chooses  an  E xi  4—  y/mxn  (if  it  has  not  already 
been  defined  by  a  previous  query),  and  returns  A*.,  •  S^’  +  x,  ■  Ex/  to  the  attacker.  It  is  clear  by  the  definitions 
of  Q®  and  n®  that  if  F  is  some  G'  4—  1\  then  S  emulates  access  to  Gq,  s  {e  ,}  G  G®  with  the 

appropriate  distribution,  whereas  if  F  is  a  uniformly  random  function,  then  S  emulates  access  to  H  4—  'H.1'1'1 . 
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Game  Hi.  We  (lazily)  choose  a  uniformly  random  function  U :  {0, 1}*  — >  Z”'xn 
oracle  access  to  U(-). 

Q 

We  claim  that  H \  ~  Hi  under  the  decision- LWE  assumption  from  Theorem 


and  give  the  attacker 


5.2 


To  prove  this,  we 

design  an  efficient  simulator  S  that  is  given  access  to  an  oracle  O  that  outputs  arbitrarily  many  pairs 
(A*,  B*)  6  Z™xn  x  Z™xn,  drawn  either  as  a  group  of  samples  ( A* ,  B*  =  A*S  +  E  mod  q )  from  the  LWE 
distribution  A§  Y  (for  the  same  S  xnxn)  or  from  the  uniform  distribution,  and  S  emulates  either  game 
H i  or  Hi  (respectively)  to  an  attacker.  Under  the  decision-LWE  assumption,  this  will  establish  the  claim. 
The  simulator  S  answers  queries  x  =  (x' .  xt )  where  |x'|  =  i  —  1  in  the  following  way:  if  x'  has  never  been 
queried  before,  then  it  draws  a  new  sample  (A^,, ,  B^,)  from  O  and  stores  it,  otherwise  it  looks  up  the  already 
stored  (A*., ,  B*,).  It  then  returns  A4C,  if  Xj  =  0,  and  B*:,  if  x^  =  1.  It  is  clear  by  inspection  and  the  definition 
of  Hl)  that  S  has  the  claimed  behavior  given  the  two  types  of  oracles  O. 

By  the  triangle  inequality,  we  have  Hq  ~  Hi,  i.e.,  G!>>  is  a  pseudorandom  function  family.  □ 

We  now  analyze  the  ring- LWE  construction. 


Proof  sketch  for  Theorem  5.3  The  proof  proceeds  almost  identically  to  the  proof  of  Theorem  |5.2[  so  we  only 
outline  the  few  small  differences.  We  define  the  function  families  HQ  and  HQ  in  exactly  the  same  fashion  as 
the  families  Q  and  Q,  respectively,  with  a  E  Rq,  sr  E  R  and  ex>  E  R  substituting  A,  S,  and  Er./  respectively. 
In  the  games,  the  bad  event  BAD;c  occurs  if  any  coefficient  of  RG(x)  E  Rq  (for  RG  <r-  HQ)  is  “too  close” 
to  another  element  in  Z„  having  a  different  rounded  value,  where  “too  close”  is  defined  using  the  interval 


[-B,  B]  for  B  =  k{ry/n  ■  u(y/logn))k / \fn.  For  this  bound  B ,  the  analogue  of  Lemma  5.5  (which  bounds 


the  cumulative  error  terms,  i.e.,  the  difference  RG(x)  —  RG(x))  follows  immediately  from  Lemma 
Finally,  pseudorandomness  of  the  family  HQ  follows  analogously  to  the  proof  of  Theorem 
HH W  defined  similarly  to  R-'Q 


5.6 


2.3 


via  families 

□ 


Remark  5.8.  By  almost  identical  proofs,  a  similar  subset-product-like  construction 


FA){sii6}(*i"-®fc) 


k 

a*  n s 


(5.3) 


for  uniform  A  E  Z”,xm  and  matrices  S,./,  E  Znxn  (for  i  E  [A:] .  b  E  {0, 1}),  and  the  analogous  function 
in  the  ring  setting,  are  also  PRF  families  for  the  same  parameters  and  distributions  as  in  Theorem|5.2|and 
Theorem  [53]  (These  functions  are  analogous  to  the  factoring-based  PRF  of  BNRROOi.)  While  the  secret  keys 
are  about  twice  as  large  as  their  counterparts’  from  Definition  EH  these  functions  are  more  “symmetric,” 
which  may  be  important  in  practice  (e.g.,  to  prevent  timing  attacks). 


5.4  Security  Proof  Under  Interactive  LWR 

We  now  present  an  “interactive”  LWR  assumption  and  prove  that  under  this  assumption,  the  degree-A: 
construction  from  Definition  5.1  is  a  PRF  under  an  appropriate  distribution  of  the  S,;.  The  advantage  of  this 
proof  is  that  it  allows  us  to  prove  security  for  a  small  modulus  q  and  inverse  error  rate  (both  small  polynomials 
in  n),  and  it  also  works  for  uniformly  random  (or  uniform  invertible)  matrices  S among  other  distributions. 
For  example,  this  allows  us  to  compose  the  degree-A;  construction  with  itself  (or  with  any  other  PRF)  in  a 
A;-ary  tree.  The  drawback  to  our  proof  is  that  it  relies  on  a  stronger  assumption  that  is  harder  to  evaluate  or 
falsify,  because  it  allows  the  adversary  to  make  queries  to  its  challenger. 
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Definition  5.9  (fc-subset-product  LWR).  Let  q  >  p  be  integer  moduli.  We  describe  a  pair  of  games,  which 
are  parameterized  by  integers  k  >  1  and  m  =  poly(n),  and  a  distribution  ip  over  Z”xn  (e.g.,  the  uniform 
distribution).  In  both  games,  we  choose  A  G  Z"xm  uniformly  at  random  and  S,  g-  ip  independently  for  each 
i  G  [fc],  then  give  A  and  Si  for  i  6  [fe  —  1]  to  the  attacker.  We  then  allow  the  attacker  to  adaptively  make 
queries  to  a  function  H :  {0,  1 } k  " 1  — >  Z"ixri.  In  the  first  game,  the  function  H  is  defined  to  be 


H(x) 


k- 1 

TU: 

2—1 


•Si 


in  the  second  game,  H  is  a  uniformly  random  function.  The  k-subset-product  LWR  problem,  denoted 
k-\SNRq  p  m  jjj,  is  to  distinguish  between  these  two  games  with  an  advantage  non-negligible  in  n.  The  k- 
subset-product  ring-LWR  problem  is  defined  analogously.  (A  subset-product  version  of  (ring-)LWE  is  also 
easy  to  formulate,  where  instead  of  rounding  we  add  random  and  independent  error  terms  to  each  answer.) 

We  make  a  few  simple  observations  about  the  A: -LWR  problem.  First  note  that  B^,  =  A*  •  fli=i  S'c‘  is 
the  part  of  the  product  that  changes  for  each  new  query.  Since  A  and  all  the  S,;  for  i  £  [k  —  1]  are  given  to  the 
attacker,  it  can  compute  each  B*  on  its  own,  and  its  goal  is  to  determine  whether  the  challenger  is  returning 
rounded  products  [B*.  •  Sk]P  or  uniformly  random  and  independent  values.  In  effect,  the  &-LWR  problem  is 
therefore  to  solve  LWR  when  the  sampled  A  matrices  are  related  by  adversarially  chosen  subset-products  of 
given  random  matrices  S,;.  To  avoid  an  efficient  attack  (as  outlined  in  the  introduction),  the  distribution  ip 
should  be  chosen  so  that  the  product  of  many  S i  <—  ip  does  not  significantly  reduce  the  entropy  of  A*  ]~j(  S,. 
It  appears  that  restricting  ip  to  invertible  elements  is  most  effective  for  this  purpose. 

We  also  observe  that  l-LWR9  Pjmi^  is  just  the  standard  LWR,pp  problem  given  m  samples,  where  the 
secret  matrix  S  is  chosen  from  ip.  The  problems  form  a  hierarchy  over  k,  that  is,  A:-LWR,/prn^  no  harder  than 
( k  —  1  j-LWRp  p-mp/,,  by  a  reduction  that  just  prepends  0  to  all  queries,  and  withholds  Si  from  the  attacker. 

Theorem  5.10.  Endow  the  family  T  from  Definition\5.2\with  the  distribution  where  each  S,  is  drawn  from 
some  distribution  ip.  Then,  assuming  that  A:-LWRppm,/,  problem  is  hard,  the  family  T  is  pseudorandom. 

Unlike  our  inductive  proof  of  Theorem  |5.6[  which  transitions  from  the  PRF  family  to  a  random  function 
by  “dropping”  the  secret  key  components  S,  from  i  =  1  to  k,  the  proof  of  Theorem  |5 . 1 0|  drops  them  from 
i  =  k  down  to  1.  This  prevents  the  error  terms  from  growing  with  k  (because  the  errors  are  not  compounded 
by  multiplication  with  other  S?),  which  is  what  allows  us  to  use  a  small  modulus  q  if  we  so  desire.  However, 
this  style  of  proof  also  seems  to  require  an  interactive  assumption,  so  that  a  simulator  can  answer  queries 
involving  the  component  Sj  that  is  being  dropped  between  adjacent  games. 

Proof  of  Theorem  [5. 1 0\  We  prove  this  by  induction  over  k.  For  A:  =  0,  the  claim  follows  trivially  by 
construction.  For  k  >  1,  we  again  proceed  via  a  series  of  games. 


Game  Hq.  This  is  the  real  PRF  attack  game  against  the  family  F\  we  choose  an  F  T ,  and  the  attacker 

has  oracle  access  to  F(-). 


Game  H\.  We  choose  F  <—  T .  For  attacker  queries  of  the  form  x  =  x\  . . .  Xk-\  L  we  return  uniformly 
random  and  independent  value  (consistent  with  prior  answers),  and  for  queries  of  the  form  x  =  x\  . . .  .x/,._i0, 


we  return  F(x)  =  A*  •  nf=i 
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Q 

We  claim  that  Hq  ~  H \  by  a  straightforward  reduction  assuming  the  hardness  of  fc-LWR.  As  proof,  we 
construction  a  simulator  S  that  interacts  with  an  oracle  O  that  implements  one  of  the  two  games  from  the 
fc-LWR  problem,  and  emulates  either  Hq  or  H\  respectively.  The  simulator  is  first  given  some  matrices  A 
and  Sj  for  i  E  [k  —  1],  It  then  answers  attacker  queries  x  =  (x',  0)  E  {0,  1  }k  by  returning  [A*  ■  n /  =  i'  S^"|p> 
and  answers  queries  x  =  (xr,  1)  E  {0,  l}k  by  returning  0(x')  to  the  attacker.  It  is  clear  by  inspection  that 
the  behavior  of  S  is  as  claimed. 

Game  H^.  We  lazily  choose  a  uniformly  random  function  U :  {0,  l\k  -X  Z"iX"  and  give  the  attacker 
oracle  access  to  U(-). 

C 

We  claim  that  H\  ~  H2  by  the  inductive  hypothesis.  This  is  because  in  game  Hi,  queries  ending  in  1 
are  already  answered  uniformly,  while  queries  ending  in  0  are  answered  according  to  a  function  drawn  from 
the  family  T  of  degree  (k  —  1).  This  family  is  pseudorandom  by  the  inductive  hypothesis,  and  the  fact  that 
(, k  —  1)-LWR  is  no  easier  than  k-L\NR. 

This  completes  the  induction  and  the  proof.  IP 
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Abstract 

The  security  of  contemporary  homomorphic  encryption  schemes  over  cyclotomic  number  field  relies 
on  fields  of  very  large  dimension.  This  large  dimension  is  needed  because  of  the  large  modulus-to-noise 
ratio  in  the  key-switching  matrices  that  are  used  for  the  top  few  levels  of  the  evaluated  circuit.  However,  a 
smaller  modulus-to-noise  ratio  is  used  in  lower  levels  of  the  circuit,  so  from  a  security  standpoint  it  is 
permissible  to  switch  to  lower-dimension  fields,  thus  speeding  up  the  homomorphic  operations  for  the 
lower  levels  of  the  circuit.  However,  implementing  such  field-switching  is  nontrivial,  since  these  schemes 
rely  on  the  field  algebraic  structure  for  their  homomorphic  properties. 

A  basic  ring-switching  operation  was  used  by  Brakerski,  Gentry  and  Vaikuntanathan,  over  rings  of  the 
form  Jj[X\/(X2  +  1),  in  the  context  of  bootstrapping.  In  this  work  we  generalize  and  extend  this  tech¬ 
nique  to  work  over  any  cyclotomic  number  field,  and  show  how  it  can  be  used  not  only  for  bootstrapping 
but  also  during  the  computation  itself  (in  conjunction  with  the  “packed  ciphertext”  techniques  of  Gentry, 
Halevi  and  Smart). 


1  Introduction 

The  last  few  years  have  seen  a  rapid  advance  in  the  state  of  fully  homomorphic  encryption,  yet  despite 
these  advances,  the  existing  schemes  are  still  too  expensive  for  many  practical  putposes.  In  this  paper  we 
make  another  step  forward  in  making  such  schemes  more  efficient.  In  particular,  we  present  a  technique  for 
reducing  the  dimension  of  the  ciphertexts  involved  in  the  homomorphic  computation  of  the  lower  levels  of  a 
circuit.  Our  techniques  apply  to  homomorphic  encryption  schemes  over  number  fields,  such  as  the  schemes 
of  Brakerski  et  al.  IS  013,  as  well  as  the  variants  due  to  Lopez- Alt  et  al.  lfl4ll  and  Brakerski  j2l. 

The  most  efficient  variants  of  these  schemes  work  over  number  fields  of  the  form  Q(C)  —  Q[A"]/ F(X), 
and  in  all  of  them  the  field  dimension  n,  which  is  the  degree  of  F(X),  must  be  set  large  enough  to  ensure 
security:  to  support  homomorphic  evaluation  of  depth- L  circuits  with  security  parameter  A,  the  schemes 
require  n  =  Cl(L  ■  polylog(A)),  even  under  the  strongest  plausible  hardness  assumptions  for  their  underlying 
computational  problems  (e.g.,  ring-LWE  Iif5l  )[|]  In  practice,  the  field  dimension  for  moderately  deep  circuits 
can  easily  be  many  thousands.  For  example,  to  be  able  to  evaluate  AES  homomorphically,  Gentry  et  al.  llT3ll 
used  circuits  of  depth  L  >  50,  with  a  corresponding  field  dimension  of  over  50,000. 

‘The  schemes  from  GO  ED  can  also  obtain  security  by  using  high-dimensional  vectors  over  low-dimensional  number  fields.  But 
their  most  efficient  variants  use  low-dimensional  vectors  over  high-dimensional  fields,  since  the  runtime  of  certain  operations  is 
cubic  in  the  dimension  of  the  vectors. 
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As  homomorphic  operations  are  performed,  the  ratio  of  noise  to  modulus  in  the  ciphertexts  grows. 
Consequently,  it  becomes  permissible  to  use  lower-dimension  fields,  which  can  speed  up  further  homomoiphic 
computations.  However,  since  we  must  start  with  ciphertexts  from  a  high-dimensional  field,  we  need  a 
method  for  transforming  them  into  small-field  ciphertexts  that  encrypt  the  same  (or  related)  messages. 
Such  a  “field  switching”  procedure  was  described  by  Brakerski  et  al.  0,  in  the  context  of  reducing  the 
ciphertext  size  prior  to  bootstrapping.  The  procedure  in  0 ,  however,  is  specific  to  number  fields  of  the 
form  K2k  =  Q[X]/{XZ  +  1),  i.e.,  cyclotomic  number  fields  with  power-of-2  index.  Moreover,  by  itself  it 
cannot  be  combined  with  the  “packed  evaluation”  techniques  from  llT8l  fTT  1 .  (These  techniques  use  Chinese  - 
remainder  encoding  to  “pack”  many  plaintext  values  into  each  ciphertext,  and  then  each  homomorphic 
operation  is  applied  to  all  these  values  at  once.  For  our  purposes,  we  must  consider  the  effect  of  the  field¬ 
switching  operation  on  all  these  plaintext  values.)  Extending  and  improving  the  field  switching  procedure  is 
the  goal  of  our  work. 

1.1  Our  Contribution 

We  present  a  general  field-switching  transformation  that  can  be  applied  to  any  cyclotomic  number  field 
K  =  Q(Cm)  —  Q[A]/Tm(X)  for  arbitrary  rn  (where  <I>m ( X )  G  Z[X]  is  the  mth  cyclotomic  polynomial), 
and  works  well  in  conjunction  with  packed  ciphertexts.  For  any  divisor  m'  of  m,  our  procedure  takes  as  input 
a  “big-field  ciphertext”  c  over  K  that  encrypts  many  plaintext  values,  and  outputs  a  “small-field  ciphertext” 
d  over  K'  =  Q(Cm'  )  —  Q[2f]/T>m'  (X)  C  K  that  encrypts  a  certain  subset  of  the  input  plaintext  values^] 

Our  transformation  relies  heavily  on  the  algebraic  properties  of  the  cyclotomic  number  fields  K,  K1 
and  their  respective  rings  of  algebraic  integers  R,  R' .  In  particular,  we  use  the  interpretation  of  K  as  an 
extension  field  of  K’ ,  and  relationships  between  then-  various  embeddings  into  the  complex  numbers  C;  the 
factorization  of  integer  primes  in  R  and  R!\  and  the  trace  function  Tr K/Ki  that  maps  elements  in  K  to  the 
subfield  Kf.  With  these  tools  in  hand,  the  transformation  itself  is  quite  simple,  and  consists  of  the  following 
three  steps: 

1 .  We  first  apply  a  key-switching  operation  to  obtain  a  big-field  ciphertext  over  K  with  respect  to  a 
small-field  secret  key  s'  G  K'  c  K.  Proving  the  security  of  this  operation  relies  on  a  novel  way  of 
embedding  the  ring-LWE  problem  over  K'  into  K,  which  may  be  of  independent  interest. 

2.  Next,  we  multiply  the  resulting  ciphertext  by  a  certain  element  of  the  ring  R  C  K,  which  depends  only 
on  the  subset  (or  other  function)  of  the  plaintext  values  that  we  want  to  include  in  the  output  ciphertext. 

3.  Finally,  we  take  the  trace  of  the  A'-clcmcnts  in  the  ciphertext,  thus  obtaining  an  output  ciphertext  over 
the  subfield  K' ,  which  decrypts  under  the  secret  key  s'  G  K'  to  the  desired  plaintext  values. 

We  note  that  in  addition  to  being  simpler  and  more  general  than  the  transformation  from  [0,  our  transformation 
is  also  more  efficient  even  when  applied  in  the  special  case  of  K2k  :  when  switching  from  K2k  to  K2y ,  the 
transformation  from  0  includes  a  step  where  the  size  of  the  ciphertext  (and  hence  the  time  that  it  takes  to 
perform  operations)  is  expanded  by  a  factor  of  2/,:  l:  .  Our  transformation  does  not  need  that  extra  step,  hence 
saving  this  extra  factor  in  performance. 

In  Section [2]below  we  recall  the  algebraic  concepts  needed  for  our  transformation,  and  then  the  transfor¬ 
mation  itself  it  described  in  Section[3] 

2More  generally,  the  output  ciphertext  can  even  encrypt  certain  linear  functions  of  the  input  plaintext  values. 
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Notations 

Description 

p, 

The  (prime)  modulus  of  the  cryptosystem’s  native  plaintext  space,  and  the 
finite  field  of  order  pd. 

m,  m! , 

n  =  ip(m),n'  =  ip{m!) 

The  indices  of  the  cyclotomic  fields,  where  m'\m. .  We  switch  from  the  mtli 
to  the  m’ th  cyclotomic  number  field,  which  are  of  degree  n,  ri  (respectively) 
over  the  rationals. 

m,  d,  e,  /, 

fh  is  the  largest  divisor  of  m  that  is  coprime  with  p;  d  is  the  order  of  p  in 

Z^;  e  =  ip(m) / ip(fh);  and  /  =  ip{m)/d.  Similarly  for  fh' ,  d' ,  e' ,  f. 

Cm,  Cm' 

Abstract  elements  of  order  m,  m!  (respectively)  over  the  rationals. 

K  =  <Q>(Cm),  K'  =  Q(Cm'), 
R  =  Z[Cm],  R'  =  Z[C m'] 

The  cyclotomic  number  fields  and  their  rings  of  integers. 

cr :  K  ->  Cn 
a' :  K'  ->  Cn' 

The  canonical  embeddings  of  K,  K' ,  which  endow  the  number  fields  with 
a  geometry. 

Ttk/k,  :  K  ->■  K' 

The  trace  function,  which  is  the  sum  of  the  automorphisms  of  K  that  fix  K' 
pointwise. 

The  codifferent  (or  dual)  fractional  ideals  of  II  and  R!  (respectively),  de¬ 
fined  as  Rv  =  {a  :  Tr k/q(ciR)  C  Z}  and  similarly  for  (R')v . 

G  =  Z*J(P), 

G'  =  ^/(P) 

The  multiplicative  quotient  groups  that  characterize  the  prime-ideal  factor¬ 
izations  of  pR,pR',  respectively. 

g-.G^G’ 

The  (f  / f')- to-1  homomorphism  defined  via  i  i— >  i  mod  fh' . 

Table  1:  Summary  of  the  main  algebraic  notations. 


2  Preliminaries 

This  work  uses  a  number  of  algebraic  concepts  and  notations;  to  assist  the  reader  we  summarize  the  most 
important  ones  in  Table  [I]  For  any  positive  integer  u  we  let  [?/..]  =  {(), ....  u  —  1 }.  Throughout  this  work, 
for  a  coset  z  £  =  Z/gZ  we  let  [z\q  £  Z  denote  its  canonical  representative  in  Z  n  [—q/2,  q/2).  One 

can  also  view  [-]g  as  the  operation  that  takes  an  arbitrary  integer  2  and  reduces  it  modulo  q  into  the  interval 

[-9/2,  q/2). 

2.1  Algebraic  Background 

Recall  that  an  ideal  I  in  a  commutative  ring  R  is  a  nontrivial  (i.e.,  1^0  and  I  /  {0})  additive  subgroup  which 
is  closed  under  multiplication  by  R.  For  ideals  I,  J,  their  sum  is  the  ideal  I+J  =  {a  +  b:a£l,  ft  £  J}, 
and  their  product  IJ  is  the  ideal  consisting  of  all  sums  of  terms  ab  for  a  £  I,  b  £  J.  An  /i- ideal  p  is  prime 
if  aft  £  p  (for  some  a,  ft  £  R)  implies  a  £  p  or  ft  £  p  (or  both).  All  the  rings  we  work  with  have  unique 
factorization  of  ideals  into  powers  of  prime  ideals,  and  a  Chinese  Remainder  Theorem. 

A  fractional  ideal  is,  informally,  an  ideal  with  a  denominator.  Formally,  letting  K  be  the  field  of  fractions 
of  R,  a  fractional  ideal  of  R  is  a  subset  /  C  K  for  which  there  exists  a  denominator  d  £  R  such  that  dl  C  R 
is  an  ideal  in  R.  For  an  f?-ideal  I,  the  quotient  ring  Rj  =  R/ 1  consists  of  the  residue  classes  a  +  I  for  all 
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a  €  R,  with  the  ring  operations  induced  by  R.  More  generally,  for  a  (possibly  fractional)  ideal  /  and  an  ideal 
J  C  R,  the  quotient  Ij  =  I/IJ  is  an  additive  group,  and  an  /i-modulc,  with  addition  and  multiplication 
operations  induced  by  R.  We  often  write  a  mod  /  instead  of  a  +  I  to  denote  the  residue  classes  a  +  I,  and 
we  write  a  =  b  (mod  I)  to  denote  that  a,  b  belong  to  the  same  residue  class,  i.e.,  a  +  I  =  b  +  I. 

For  computational  purposes,  all  of  the  rings  and  fields  we  work  with  have  efficient  representations  of 
their  elements,  and  efficient  (i.e.,  polynomial  time  in  the  bit  length  of  the  arguments)  algorithms  for  all  the 
operations  we  use.  For  quotients  A/B,  cosets  are  represented  using  a  fixed  set  of  distinguished  representatives. 
In  this  work  we  largely  ignore  the  details  of  concrete  representations  and  algorithms,  and  refer  to  Ifl6l  for 
fast,  specialized  algorithms  for  working  with  the  cyclotomic  fields  and  rings  that  we  use  in  this  work. 


2.1.1  Cyclotomic  Fields  and  Rings 

For  a  positive  integer  m,  let  K  =  Q(Qn)  be  the  mth  cyclotomic  number  field,  where  Qrn  is  an  abstract  element 
of  order  m.  (In  particular,  we  do  not  view  Qm  as  any  particular  root  of  unity  in  C.)  The  minimal  polynomial  of 
Cm  is  the  mth  cyclotomic  polynomial  &m(X )  =  riiez*  {X  —  dm)  £  Z[X],  where  rjm  =  ex.p(2ir^/^l/m)  E 
C  is  the  principal  mth  complex  root  of  unity,  and  the  roots  //(„  E  C  range  over  all  the  primitive  complex 
mth  roots  of  unity.  Therefore,  K  is  a  field  extension  of  degree  n  =  pirn)  over  <Q>,  and  is  isomorphic  to  the 
polynomial  ring  Q[X]/$m(X)  by  identifying  Cm  with  X.  (There  are  other  representations  of  K  as  well,  and 
nothing  in  this  work  depends  on  a  particular  choice  of  representation.)  The  ring  of  (algebraic)  integers  in  K, 
called  the  mth  cyclotomic  ring,  is  R  =  Z[Cm],  which  is  isomorphic  to  Z[X]/<S?m(X). 

The  field  extension  K/Q  has  n  automorphisms  r,  :  K  —y  K  that  fix  O  pointwise,  which  are  charac¬ 
terized  by  Tj( Cm)  =  Cm  f°r  *  ^  (Equivalently,  7y(a(X))  =  a(Xl)  mod  3>m(X)  when  viewing  K  as 
<Q)[X]/<l?m(^0-)  Because  K/Q  is  Galois  (i.e.,  the  number  of  automorphisms  equals  the  dimension  of  the 
extension),  the  Q-lineairl  (/icW)  trace  TiyNyq, :  I\  -n  Q  can  be  defined  as  the  sum  of  the  automoiphisms: 
Ti'A'/ofa)  =  Siez*  T:(«)  E  Q-  (See  below  for  another  formulation.) 

Similarly  to  the  automorphisms  n  (which  map  K  to  itself),  there  are  n  concrete  ways  of  viewing  K 
as  a  subfield  of  the  complex  numbers  C.  Namely,  there  are  n  injective  ring  homomorphisms  from  K  to  C 
that  fix  Q  pointwise,  called  embeddings,  which  are  denoted  ay  :  K  — >  C  for  i  E  Z*n  and  characterized  by 
ay  (Cm)  =  dm-  The  embeddings  may  be  seen  as  the  compositions  of  the  abstract  automorphisms  t,  with  the 
complex  embedding  that  identifies  Cm  E  K  with  dm  E  C.  Therefore,  the  field  trace  can  also  be  written  as  the 
sum  of  the  embeddings,  as  TV^/Q(a)  =  &i{a)  E  Q.  The  canonical  embedding  a:  K  —>  Cn  is  the 

concatenation  of  all  the  complex  embeddings,  i.e.,  a  (a)  =  (oy(a))j£z*  ,  and  it  endows  K  with  a  canonical 
geometry.  In  particular,  define  the  Euclidean  (£ 2 )  and  l ^  norms  on  K  as 


|cr(a)||  = 


ay (a)  |  and 


:=  lk(a)lloo  =  max|o-i(a)|, 


respectively.  Note  that  ||o  •  6||  <  Halloo  •  ||6||  and  ||a  •  6|| ^  <  HaH^  ■  ||6||oo  for  any  a,b  E  K,  because  the  ay 
are  ring  homomorphisms. 


2.1.2  Towers  of  Cyclotomics 

For  any  positive  integer  m!  dividing  m,  let  K'  =  Q(Cm')  and  R!  =  Z[Cm']  be  the  ?n/th  cyclotomic  field 
and  ring  (of  dimension  n'  =  p(rn')  over  Q  and  Z),  respectively.  As  above,  the  field  extension  K' /Q  has 

3A  function  /  is  S-linear  if  f(a  +  6)  =  /(a)  +  f(b)  and  f(s  ■  a)  =  s  ■  f(a)  for  all  s  £  S  and  all  a,  b. 
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n'  =  (p(rri')  automorphisms  t',  :  K’  — y  if'  and  n'  complex  embeddings  o'-, :  if'  H>  C  (for  i!  G  Z*  ,),  the 
latter  of  which  define  the  canonical  embedding  o' :  if'  — >  Cn  . 

We  will  use  extensively  the  fact  that  if  is  a  field  extension  of  if',  and  R  is  a  ring  extension  of  R',  both  of 
dimension  n/n'  (because  K/Q  and  if'/Q  have  dimensions  n  and  n' ,  respectively).  That  is,  K'  and  R'  may 
respectively  be  seen  as  a  subfield  of  K  =  if' (Cm)  and  a  subring  of  R  =  R' [Cm.] ,  under  the  ring  embedding 
that  identifies  Cm'  with  Qm  'n  .  Moreover,  the  field  extension  if /if'  is  Galois,  i.e.,  it  has  n/n'  automorphisms 
that  fix  if'  pointwise,  which  are  precisely  those  r,  for  which  i  =  1  (mod  m').  This  follows  from  the  fact  that 

mC m')  =  =  Cm  7  ;  =  Cm'  >  C2-1) 

and  that  reducing  modulo  m '  induces  an  (n/n')-to-l  mapping  from  7L*m  to  The  if'-linear  (intermediate) 
trace  function  Ttk/Ki  :  if  — >  K’  may  be  defined  as  the  sum  of  these  automorphisms: 

Tr  k/k>(o)  =  ^2  n (“)■ 

2—1  (mod  m') 

A  standard  fact  from  field  theory  is  that  the  intermediate  trace  satisfies  Tt^/q  =  Tr /q  o  TvK/Kj.  Another 
standard  fact  is  that  Tr K /Ki  is  a  “universal”  if'-linear  function,  in  that  any  such  function  L :  if  — >  if'  can  be 
expressed  as  L(a )  =  Tr^y^/(r  •  o)  for  some  fixed  r  €  K. 

Similarly  to  Equation  (|2.1[),  for  any  i  G  Z/7  the  embedding  fjj  coincides  with  cr'  mod  m,  on  the  subfield  if'. 
Using  this  fact  we  get  the  following  relation  between  the  intermediate  trace  and  the  complex  embeddings 
of  if  and  if'. 

Lemma  2.1.  For  any  a  G  if  and  i1  G  Z *m,, 

v[>(^KIK>(a))  =  ^>2  ai 

i=i'  (mod  m') 

In  matrix  form,  (r'(Tr  K/K'(a))  =  P  ‘  where  P  is  the  (p(m')-by-(p(m)  matrix  (with  rows  indexed  by 
i!  G  V  ,  and  columns  by  i  G  Z*n)  whose  (T,  i)th  entry  is  1  if  i  =  i'  (mod  mf),  and  is  0  otherwise. 

Proof  Fix  an  arbitrary  k  G  Z/n  such  that  k  =  i!  (mod  m').  Then  because  n',  coincides  with  Of.  on  K',  and 
by  definition  of  Tr K/K'  and  linearity  of  a k,  we  have 


T-'  (TrA7X/  (o))  =  ak  (  ^  U  (a)  j 

j=l  (mod  m') 

=  ak('rj{a))  =  Y 

j= 1  (mod  m;)  2=2'  (mod  m/) 

where  for  the  last  equality  we  have  used  Ok  °  Tj  =  ak-j  and  k  G  Z/,  ,  so  i  =  k  ■  j  G  Z/,  runs  over  all  indices 
congruent  to  i'  modulo  m!  when  j  G  Z/(  runs  over  all  indices  congruent  to  1  modulo  m'.  □ 


An  immediate  corollary  is  that  the  intermediate  trace  maps  short  elements  of  if  to  short  elements  of  if'. 
Corollary  2.2.  For  any  a  G  if,  we  have  ||TrAyA/(a)||  <  ||a||  •  \Jnjn' . 


Proof.  By  Lemma 


2.1 


we  have  <j'(TrK/Ki(a))  =  P  ■  a  (a).  The  rows  of  P  are  orthogonal  (since  each 
column  of  P  has  exactly  one  nonzero  entry),  and  each  has  Euclidean  norm  exactly  \Jri/n' .  □ 


5 


Approved  for  Public  Release;  Distribution  Unlimited. 
143 
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Figure  1:  Factorization  of  2  G  Z  into  distinct  prime  ideals  p',  in  /?'  =  Z[(y\,  and  p,  in  R  =  Z[</gi].  The 
displayed  subscripts  indicate  a  choice  of  representatives  from  the  cosets  of  the  multiplicative  subgroups 
(2)  C  Zy  and  (2)  C  Z/, ,  which  have  orders  d!  =  3  and  d  =  12,  respectively. 

2.1.3  Prime  Splitting  and  Plaintext  Arithmetic 

We  now  describe  the  factorization  (“splitting”)  of  prime  integers  in  cyclotomic  rings,  how  it  allows  for  encod¬ 
ing  and  operating  on  several  finite-field  elements,  and  the  particular  functions  induced  by  the  (intermediate) 
trace  function  Tr k/K'-  Further  details  and  proofs  can  be  found  in  many  texts  on  algebraic  number  theory, 
e.g.,  (191. 

Prime  splitting.  Let  p  G  Z  be  a  prime  integer.  In  the  mth  cyclotomic  ring  R  =  Z[</m]  (which  has  degree 
n  =  (p(m)  over  Z),  pR  is  often  not  a  prime  ideal,  but  instead  factors  into  prime  ideals.  To  describe  how,  we 
first  need  to  introduce  some  notation.  Divide  out  all  the  factors  of  p  from  m,  writing  m  =  m  ■  pk  where 
p\m.  Let  e  =  <-p{pk),  and  let  d  be  the  multiplicative  order  of  p  modulo  m  (i.e.,  in  Z/J;  note  that  d  divides 
pi'rii)  =  n/e.  (The  values  d.  e  are  respectively  called  the  inertial  degree  and  ramification  index  of  p  in  R.) 
Let  G  =  Z/,/ (p),  the  multiplicative  quotient  group  Z/(  modulo  the  order-d  subgroup  generated  by  p,  so  G 
has  order  /  =  ip(fh)/d  =  n/ ide).  For  an  element  i  €  G  of  this  group,  we  sometimes  write  i(p)  to  emphasize 
that  it  is  a  coset,  and  (slightly  abusing  notation)  also  let  i  G  Z*^  denote  some  element  of  the  coset.  The  ideal 
pR  factors  as 

pr= n  p" >  m 

i£G 

where  the  p,  are  distinct  prime  ideals  in  R,  all  having  norm  /i/p,  |  =  pd.  These  are  called  the  prime  ideals 
lying  over  p  in  R.  Each  quotient  ring  /i/p,  is  therefore  isomorphic  to  the  finite  field  F  d.  (In  fact  there  are 
exactly  d  isomorphisms  between  them,  because  Fptz  has  d  automorphisms.) 

Concretely,  the  prime  ideals  p,,  and  the  isomorphisms  between  /i/p,  and  (some  canonical  representation 
of)  Wpd ,  are  as  follows.  Let  denote  some  arbitrary  element  of  order  m  in  ¥p,i ;  such  an  element  exists 
because  the  multiplicative  group  F*,;  is  cyclic  and  has  order  pd  —  1  =  0  (mod  fh).  For  any  i(p)  G  G, 
the  prime  ideal  p;  is  the  kernel  of  the  ring  homomorphism  hi :  R  — >  ¥p,t  defined  by  h,  (Cm  )  =  ■  It  is 

immediate  that  this  kernel  is  an  ideal;  furthermore,  it  is  invariant  under  the  choice  of  representative  i  from  the 
coset  i(p),  because  hip(r)  =  hi(r)p  for  any  r  G  R  (since  ( a  +  b)p  =  ap  +  P  for  any  a,  b  G  Fpd).  Because  pj 
is  the  kernel  of  hi,  we  have  the  induced  isomorphism  hi :  R/pi  —>  ¥p,i ;  indeed,  we  have  d  distinct  such 
isomorphisms,  one  for  each  element  of  the  coset  i(p). 

Looking  ahead,  the  isomorphisms  hi  (for  appropriate  choices  of  representatives  i)  will  be  used  to  define 
several  “plaintext  slots”  in  a  homomorphic  cryptosystem,  i.e.,  an  encoding  of  /  plaintext  elements  of  ¥p<i  as  a 
single  element  of  the  cryptosystem’s  plaintext  ring  R/2R. 
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Splitting  in  cyclotomic  towers.  Of  course,  the  above  derivation  also  applies  to  the  ideals  that  lie  over  p  in 
Rl  =  Z[£m']  C  R.  For  each  such  ideal  p',  we  next  describe  the  factorization  of  p'R  into  prime  ideals  in  R. 
These  are  the  prime  ideals  that  lie  over  p'  in  II,  and  since  “lying  over”  is  an  associative  property,  they  also  lie 
over  p  (as  illustrated  in  Figure  [TJ. 

Let  m,  d,  e,  /,  G  and  the  prime  ideals  p,  for  i  G  G  he  as  above  for  II,  and  define  fhl ,  d',  e' ,  /',  G'  = 
/ (p)  and  prime  ideals  p',  for  i'  G  G'  similarly  for  R! .  Note  that  d'\d,  e'\e,  and  f'\f,  and  that  the  natural 
homomorphism  g:  G  G'  defined  via  i  i  mod  fh'  is  surjective  and  (f/f)- to-1.  Then  for  every  i'  G  G' , 
the  factorization  of  p',i?  is 

p<'R=  n  pf=  n  pf- 


i£g— 1(i/)  i=i'  (mod  m) 


Therefore,  there  are  ///'  prime  ideals  of  II  lying  over  each  p',,  and  taken  over  all  i!  G  G'  they  partition  the 
prime  ideals  of  R  lying  over  p. 


Plaintext  encoding.  Let  F  =  ¥pd  and  F'  =  ¥pdi  C  F.  By  the  above  and  the  Chinese  Remainder  Theorem, 
the  natural  ring  homomorphisms  yield  the  following  (where  =  denotes  a  ring  isomorphism): 

R'/pR'  -J.  R'/(  n  p'O  =  ©  R'/Pi>  =  F;/' 
i'eG>  i'eG ' 

r/pr  r/(h  Pi) = r/{  n  n  Pi)  =  ©  ©  r/Pi = (F^y#. 

i&G  i'&G' i&g-1^')  i'eG'ieg-Ri') 

(Note  that  the  first  homomorphism  in  each  line  is  surjective,  but  not  necessarily  an  isomorphism,  due  to 
possible  ramification.)  Following  lfT8ll3,  TT1  fl2l  U3ll.  in  the  context  of  homomorphic  encryption  the  above 
morphisms  allow  for  encoding  a  vector  of  f  individual  elements  of  F'  (respectively,  /  elements  of  F)  into  the 
plaintext  ring  R'p  =  R/pR '  (resp.,  Rv  =  R/pR),  so  that  a  single  homomotphic  addition  and  multiplication 
acts  component-wise  on  the  underlying  vectors  of  field  elements. 


Trace  operations.  As  mentioned  in  the  introduction,  our  field-switching  technique  is  built  around  applying 
the  trace  function  Tr K/K/  to  the  elements  of  a  big-field  ciphertext,  thus  obtaining  a  related  small- field 
ciphertext.  Since  we  use  “packed”  ciphertexts  that  encrypt  arrays  of  elements  in  F  via  the  above  isomorphisms, 
we  need  to  understand  the  effect  of  the  trace  function  on  those  F-elements. 

The  remainder  of  this  subsection  is  therefore  devoted  to  characterizing  the  functions  (Ff/f  )f  — >  ¥'f 
that  can  be  induced  by  Tr^ jKi .  More  specifically,  we  determine  exactly  which  functions 


L-  ( n  pi)  - ^  R-  a  n 

ieG  i'eG' 

can  be  expressed  as  L(a)  =  Tr K/Ki(r  ■  a)  for  some  fixed  r  G  K.  It  turns  out  that  by  fixing  an  appropriate 
choice  of  isomorphisms  between  the  quotient  rings  and  finite  fields  above,  we  can  obtain  the  concatenation 
of  any  f  individual  F'-linear  functions  ¥^^''  -p-  F'  (see  Corollary  2.5  for  a  precise  statement)^] 

As  already  noted,  the  isomorphisms  between  the  quotient  rings  and  finite  fields  are  not  necessarily 
unique;  they  are  determined  by  the  choice  of  representatives  i! ,  i  of  the  cosets  i'(p)  C  Z*-  ,  and  i (p)  C  Z(T) 
(respectively),  and  roots  of  unity  uj,rnj  G  F'  and  ujr-n  G  F.  For  our  purposes,  it  is  important  to  choose 


4Note  that  any  F'-linear  function  L :  — >  F'  can  always  be  expressed  as  L{a )  =  Trf/F/  {(d,  a})  for  some  fixed  d  G  F^f  ; 

where  (•,  •)  is  the  usual  inner  product  and  TrF/F/  denotes  the  (F'-linear)  trace  of  the  field  extension  F /F'. 
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these  in  a  “consistent”  fashion,  as  follows.  First,  given  ojfn,  let  G  IF'.  (Note  that  all  c p(m ') 

elements  of  order  fh'  in  F  are  indeed  in  the  subfield  F'.)  Next,  let  i  >  0  be  the  integer  exponent  such 
that  m/m'  =  {m/m')  ■  p.  Then  given  representative  i'  of  i'(p)  G  G',  choose  representative  i  for  each 
i(p)  G  g~l{i')  so  that  jr  ■  i  =  i!  (mod  fh').  Note  that  such  i  always  exists,  by  definition  of  the  quotient 
group  G  and  the  mapping  g.  As  explained  above,  these  choices  fix  particular  isomorphisms 


hi :  R/pi  — >  F  (for  i(p)  G  G)  and  h'p  :  //'/p',  — >  F'  (for  i'(p)  G  G'), 

which  are  characterized  by  hi{ (m)  =  ojlm  and  /r'/(Cm')  = 

Next,  for  each  i'  G  G'  denote  the  product  of  prime  ideals  lying  over  p',  in  R  (called  the  radical  of  p';  R) 
by  ~Pi>  =  \[ieg-Hi.)  pi,  and  define  the  ring  isomoiphism 

K' ■  R/Pi'  ->  hi> (a )  =  ( hi{a  mod  Pi))ie9-i(i/)> 


where  F f/f  denotes  the  product  ring  with  coordinate- wise  operations. 

In  Lemma  2.4  below,  we  show  that  under  the  above  isomorphisms,  the  F'-linear  functions  L :  F^f  -P  F' 
correspond  bijectively  with  the  /('-linear  functions  L:  R/pp  —t  R' / p',,  for  all  i'  G  6".  Recall  that  any 
function  of  the  latter  type  can  be  expressed  as  L(a)  =  Tr K/K'(r  ■  a)  for  some  fixed  r  G  K.  Conversely, 
every  function  L  (with  domain  and  range  as  above)  that  can  be  expressed  as  L(a)  =  TrK/K'(r  ■  a)  is  clearly 
//'-linear,  so  it  always  induces  an  F'-linear  function.  The  heart  of  Lcmnia[0]is  the  following  fact. 


Lemma  2.3.  Let  p',  for  some  i!  G  G'  he  a  prime  ideal  lying  over  p  in  R',  and  let  p,/  he  the  radical  ofppR. 
Let  r'  G  R'  C  R  be  arbitrary,  and  let  s  =  h/{r'  mod  p',)  G  F'  C  F.  Then 


hp{r'  mod  pp)  =  (s,  s, . . . ,  s)  G  F'-^  , 


i.e.,  every  entry  of  hp ir'  mod  pp)  is  equal  to  h/, (r'  mod  p',). 


Proof.  Recall  that  under  our  choice  of  isomorphisms,  uirjp  =  co///r"  G  F'  is  of  order  m',  and  pl  ■  i  = 
i'  mod  fh' ,  where  /  >  0  is  the  integer  satisfying  m/m'  =  ( fh/fh ')  •  %/.  Also  recall  that 

hp{r'  mod  pj/)  =  (hi(r'  mod  Pi))ieg-i{i,y 

For  the  representative  i  of  each  coset  i (p)  G  <f  1  ( /' ) .  the  entry  ht  (r'  mod  pj)  is  obtained  by  mapping  (/m  to 
and  hence  also  mapping  Cm'  =  Cm  m  =  Crn^™  ^ P  to 


_  vl-i  _  i’  w 
Ufh  ~  ^fh'  ~  Urh'  &  ) 

which  is  exactly  the  mapping  done  by  h ',.  Since  r'  G  /('  =  Z [((.„//]•  this  proves  the  claim.  □ 


Lemma  2.4.  Let  i'  G  G'  be  arbitrary,  and  let  p'  =  p',  and  p  =  pp.  Then  under  the  isomorphisms  h!  =  h), 
and  h  =  hp  defined  above,  the  ¥' -linear  functions  L :  F-^  — >  F'  are  in  bijective  correspondence  with  the 
R' -linear functions  L:  R/p  —r  R' / p'. 


Proof  For  any  F'-linear  function  L,  we  claim  that  L  =  h'  1  o  L  o  h  is  the  corresponding  //'-linear  function. 

and  the  fact  that  h  is  a  ring  homomorphism,  for  any  r'  G  //'  and  a  G  R/p 

we  have 


To  see  this,  note  that  by  Lemma 


2.3 


h(r'  ■  a)  =  h(r'  mod  p)  ©  h(a)  =  h'(r'  mod  p')  •  h(a)  G  , 
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where  multiplication  ©  in  F'f  and  F^  is  coordinate-wise.  By  F'-linearity  of  L  and  the  fact  that  hi  is  a  ring 
homomorphism,  we  have 

L(r'  ■  a)  =  h!  l{L(h(r'  ■  a)))  =  h!  X(  h! {r'  mod  p7)  •  L(h(a))  )  =  r'  ■  L(a)  G  B! /$' , 
as  desired.  The  other  direction  proceeds  essentially  identically,  with  L  =  h'  o  L  o  h~l.  □ 


An  application  of  the  Chinese  Remainder  Theorem  with  the  prime  ideals  p,/  in  R,  combined  with 
Lemma [24}  immediately  yields  the  following  corollary. 

Corollary  2.5.  Let  p7  =  n«'eG'  Pi’  anc^  P  =  FL'er,"  P*  be  the  radicals  of  pR'  and  pR,  respectively.  Then 
under  the  isomorphisms  { }?v and  {hj/}il£Q,  defined  above,  the  R! -linear functions  L  :  R/p  — >  R! /p' 
are  in  bijective  correspondence  with  the  functions  L:  (F^/T)/7  — >  F’l'  of  the  form 

where  every  Lt, :  F^P  — >  F7  is  F7 -linear 


We  note  that  given  a  function  L :  (F^f)?’  —r  F'f  as  in  Corollary  |2.5[  we  can  efficiently  find  an  R1- linear 
function  L:  R  — ^  R!  that  induces  the  corresponding  L:  first,  fix  an  arbitrary  R! -basis  B  =  { }  of  R. 
Then,  using  the  isomorphisms  hi,  and  hi>,  the  values  of  L(hj  mod  p)  G  R' /p’  are  determined  by  L,  and 
uniquely  define  L  by  /f'-l inearity.  We  can  then  define  each  L(bj )  G  R'  to  be  an  arbitrary  representative  of 
L(bj  mod  p);  these  choices  uniquely  determine  L,  by  /i'-lincarity.  Finally,  we  can  represent  L  explicitly 
in  trace  form  as  L(a)  =  Tr kik'(t  '  a )  f°r  some  r  G  K:  recalling  that  K  is  a  vector  space  over  K'  with 
/f'-basis  B,  we  have  a  full-rank  system  of  linear  equations  L(bj )  =  Tr k/k'{t  ■  bj)  G  K' ,  which  we  can 
solve  to  obtain  r  G  K. 

Looking  ahead,  in  our  application  to  homomorphic  computation  we  will  have  certain  linear  functions  that 
we  want  to  evaluate  (e.g.,  projection  functions),  and  we  will  do  so  by  finding  the  corresponding  constant  r, 
then  multiplying  by  r  and  taking  the  trace  (see  Section  3.3  for  further  details).  To  apply  these  steps  in  the 
context  of  a  homomorphic  encryption  scheme,  we  need  the  notion  of  the  dual  of  the  ring  of  integers,  described 
next. 


2.1.4  Duality 

An  important  and  useful  object  in  K  is  the  dual  of  R  (also  known  as  the  codijferent  of  K),  defined  as 

Ry  =  {a  G  K  :  Tr k/q(ciR)  C  Z}  D  R. 

Because  TY^/q  =  Tt^z/q  o  Ttk/k,,  it  is  easy  to  verify  that  also  f?v  =  {a  G  K  :  TrK/K,(aR.)  C  R,v}. 
Therefore,  we  have  the  convenient  equation 

Tr  k/k,(Rv)  =  R'v.  (2.3) 

Note  that  by  contrast,  frequently  Tr  k/k'(R)  does  not  equal  R' ,  but  is  instead  some  proper  ideal  of  it  Id  Many 
other  algebraic  and  geometric  advantages  of  working  with  Rf  instead  of  II  are  discussed  in  lfl5l  1161. 

5This  is  easily  seen,  e.g.,  for  R  =  Z[£2fc]  and  R'  =  Z,  where  Tr(ff)  =  2k~1R'  because  Tr(l)  =  2k~1  and  Tr(^fe)  =  0  for 
3  =  1,...,2*-1-1. 
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The  codifferent  is  a  principal  fractional  ideal,  i.e.,  Rv  =  t~1R  for  some  t  E  R  (which  is  not  unique). 
Therefore,  division  by  t  induces  a  bijection  from  R  to  Rv,  and  from  any  quotient  ring  Rp  =  R/p  to 
Rp  =  Ry  /pRy .  Although  the  target  objects  are  not  rings  (because  Ry  ■  Ry  %  Ry),  they  are  //-modules, 
and  the  bijections  are  /(’-module  isomorphisms. 

Of  course,  we  also  have  R'y  =  t'~lR'  for  some  t'  E  R! .  By  Equation  (|2.3[)  and  //'-linearity  of  the  trace, 
for  any  ideal  p  in  R! ,  we  have 


Tr  K/K’(Rp)  =  TrK/K,(Ry/pRy)  =  R,y /pR'y  =  R,y . 

In  the  previous  subsection  we  considered  //'-linear  functions  L:  R  — »•  R'  (or  their  induced  functions 
Rp  — >  R'p ),  which  can  always  be  expressed  as  L(a)  =  Tr K/K'(ry  ■  a)  for  some  fixed  rv  E  K.  Typically,  rv 
is  not  in  R  because  Tr k/k'{R)  /  Tl',  but  it  is  easy  to  see  that  rv  E  t'Ry  always,  because  if  not,  then 
Tr  k/  K'  (/’ V  R)  %  t'R,y  =  //'.  For  the  puiposes  of  our  field-switching  procedure,  it  will  be  more  convenient 
to  instead  work  with  corresponding  //'-linear  functions  from  Ry  to  /i'v,  which  can  be  represented  in  trace 
form  by  elements  of  R.  Namely,  for  an  //'-linear  function  L  :  R  — >  //',  where  L(a )  =  Tr K/Ki(r-y  ■  a)  for 
some  rv  E  t'Ry ,  we  will  consider  the  corresponding  function 

Ly:Ry  ->  R,y,  Ly(ay)  =  L{t  •  ay)/t'  =  TiK/K,({t/t')ry  •  av)  =  T \K/K,{r  ■  aV), 
which  is  represented  by  r  =  {t/t')ry  E  R. 

Following  lfl6ll.  we  extend  the  operation  [-]9  to  Ry  by  fixing  a  particular  Z-basis  of  Ry  (and  Z^-basis 
of  Ry ),  called  the  decoding  basis,  and  representing  the  argument  as  a  Zg-combination  of  the  basis  vectors  and 
applying  the  [-]g  operation  to  each  of  its  coefficients.  It  is  shown  in  llT6l  Section  6.2]  that  every  sufficiently 
short  (as  always,  under  the  canonical  embedding)  e  E  Ry  is  indeed  the  “canonical”  representative  of  its  coset 
modulo  qRy .  Specifically,  if  ||e||  <  q/(2y/n)  then  [e  mod  qRy]q  =  e. 


2.1.5  Good  Bases  of  R  and  // 


We  now  have  almost  all  the  ingredients  we  need  to  describe  the  homomoiphic  cryptosystem  and  our  field¬ 
switching  transformation.  The  final  background  material  we  need  concerns  the  geometry  of  R  as  a  module 
over  R!  (respectively,  Ry  as  a  module  over  /i'v).  Specifically,  we  construct  certain  “good”  bases  of  the 
ring  R  and  its  dual  Ry  in  terms  of  //'  and  R'y  (respectively),  and  prove  some  of  their  useful  geometrical 


properties.  This  (somewhat  technical)  material  is  used  only  in  Section  3.1  where  we  prove  the  hardness  of 
ring-FWE  over  K  with  secret  in  //',  assuming  its  hardness  over  K1  with  secret  in  //'. 

Since  K  is  a  vector  space  of  dimension  n/n'  over  K' ,  the  field  K  has  a  K' -basis  (which  is  not  unique), 
i.e.,  a  set  of  n/n'  elements  of  K  that  are  linearly  independent  over  K' ,  so  that  every  element  of  K  can  be 
represented  uniquely  as  a  K '-linear  combination  of  the  basis  elements.  Similarly,  an  R' -basis  of  //  is  a  set  of 
n/n'  elements  in  R,  such  that  every  element  of  //  can  be  represented  uniquely  as  an  R'- linear  combination  of 
the  basis  elements.  An  //'  '-basis  of  Ry  is  defined  analogously. 

We  wish  to  construct  an  //'-basis  of  R,  and  a  corresponding  dual  R,y -basis  of  Rv  (any  of  which  are 
K'  -  bases  of  K),  which  are  “good”  in  the  following  sense:  for  any  vector  of  /\ '-coefficients  (with  respect 
to  the  basis)  which  are  short  under  o',  the  corresponding  A'-c lenient  is  also  short  under  o.  More  formally, 
represent  an  ordered  //'-basis  of  K  as  a  vector  b  =  ( bj )  E  Kn/n  ,  and  similarly  for  an  arbitrary  vector 
of  //'-coefficients  a,  =  (oj)  E  Klln/n')^  which  defines  the  //-element  a  =  (a,  b)  =  STJ  aj  ■  bj.  Then  by 
linearity,  the  basis  b  induces  a  matrix  B  E  Cnxn  such  that 


a  (a)  =  B  ■  o' (a),  where  o' (a)  =  ( o'(aj ))  .. 


(2.4) 
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We  seek  an  R' -basis  b  of  R  for  which  B  (nearly)  preserves  Euclidean  norms  up  to  some  scaling  factor,  i.e., 
all  of  its  singular  values  are  (nearly)  equal. 

In  addition,  for  any  if '-basis  b  =  (bj)  of  K ,  its  dual  if '-basis  bJ  =  (6j)  C  K  is  uniquely  defined  by 
the  linear  constraints  Tr K/K'{bj  ■  by  )  =  1  if  j  =  f,  and  0  otherwise.  It  is  a  straightforward  exercise  to 

verify  that  if  b  is  an  /('-basis  of  It  then  b  J  is  an  ii'v-basis  of  i?v .  Moreover,  the  matrix  By  induced  by  bv  is 
Bv  =  B~T,  so  its  singular  values  are  simply  the  inverses  of  those  of  B. 

Lemma  2.6.  Let  rh  =  m/2  if  m  is  even  and  m'  is  odd,  otherwise  m  =  m,  and  let  r  =  rad(m)/  rad(m')  be 
the  product  of  all  primes  that  divide  m  but  not  m'.  There  exists  an  efficiently  computable  R! -basis  b  of  R,  for 
which  the  corresponding  matrix  B  has  largest  and  smallest  singular  values 

S\(B)  =  y/  m/m'  and  sn(B)  =  y/m/ ( rm '), 

respectively.  In  particular,  ifr  E  {1,2}  then  B  is  a  unitary  matrix  scaled  by  a  yj rh/ m!  factor. 

Lcmma[2di|  implies  that  for  any  a  E  K't1/71')  defining  a  =  (a,  b)  E  K  and  av  =  (a,  b  v)  E  K, 

|| cr(a) ||  <  y/m/m'  ■  ||<r'(a)||  and  ||cr(av) ||  <  y/rm'/m  •  j|a'(a)||.  (2.5) 


More  generally,  if  the  aj  are  independent  and  have  Gaussian  distributions  over  (the  canonical  embedding 
of)  if',  then  a  and  av  also  have  (possibly  non-spherical)  Gaussian  distributions  over  K  |^]  Since  we  are  not 
too  concerned  with  the  exact  distributions,  we  omit  a  precise  calculation,  which  is  standard.  However,  one 
particular  case  of  interest  is  when  the  aj  are  all  i.i.d.  according  to  a  spherical  Gaussian  of  parameter  s,  and 
r  E  {1,  2}  so  that  B  (respectively,  BJ)  is  a  scaled  unitary  matrix.  Then  because  spherical  Gaussians  are 
invariant  under  unitary  transformations,  a  (resp.,  av )  is  distributed  according  to  a  spherical  Gaussian  of 
parameter  sy/rh/m'  (resp.,  sy/m' /m). 

The  remainder  of  this  subsection  is  devoted  to  proving  Lemma  2.6  We  denote  the  /r-dimcnsional  identity 
matrix  by  i*.,  we  use  <S>  to  denote  the  Kronecker  (or  tensor)  product  of  vectors  and  matrices,  and  we  apply 
functions  to  vectors  and  matrices  component-wise. 

Following  the  treatment  given  in  Ifl6ll.  let  rn  =  }},  rrif  be  the  prime -power  factorization  of  m,  i.e.,  the 
me  >  1  are  powers  of  distinct  primes.  The  ring  R  =  Z [£m]  has  the  following  Z-basis  p,  which  is  called  the 
“powerful”  basis: 


P  = 


,Pme,  where  pme  =  (C^J 


The  set  pme  is  called  the  “power”  Z-basis  of  Z[£mJ  =  Z[(m^meJ  Q  R- 

Similarly,  let  m'  =  \\t  mf  where  each  rn/  divides  me,  i.e.,  they  are  both  powers  of  the  same  prime 
(though  possibly  =  1).  Then  the  powerful  Z-basis  of  R'  is  defined  as  p'  =  (ft),  Pnif  where  the  power 
bases  pm >  are  defined  as  above.  Notice  that  when  rn/  >  1,  there  is  a  bijective  correspondence  between 
j  E  [<p(me)\  and  ( j',k )  E  [<p{rn'f)\  x  [me/m/],  via  j  =  {me/m'fj'  +  k.  Therefore,  the  power  bases  pme 
factor  as 


Pme  =  Pm'  ®  be,  Where  be  = 


^Tri£  J  k(=.  /m'^\ 

. Pme 


if  m/  >  1 
if  m!0  =  1 . 


Hence,  using  the  commutativity  of  the  Kronecker  product  (up  to  some  permutation)  we  can  factor  the 
powerful  basis  p  of  R  as 

p  =  p'®b1  where  b  =  (^)  be-  (2.6) 

6To  be  completely  formal,  the  Gaussians  should  be  over  continuous  spaces  of  the  form  K  ®q  R;  see  Q3. 
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Because  p'  is  a  Z-basis  of  R',  it  follows  that  b  is  an  R' -basis  of  R.  We  next  calculate  the  matrix  B  £  Cnxn 
induced  by  b,  and  verify  that  it  indeed  satisfies  the  claims  in  the  lemma  statement. 

Following  lfT6l  Section  3],  for  any  prime  power  rh  we  define  CRT,-,,  to  be  the  complex  p  (rh) -by -ip(rh) 
matrix  with  ccC7  in  its  /th  row  and  jth  column,  for  i  £  Z*ri  and  j  £  [p{fn)\.  Using  the  prime -power 
factorizations  of  our  m,m' ,  we  define  CRTm  =  CRTm^  and  CRTm/  =  (£),  CRT,,,/  .  Then  up  to  a 
permutation  of  the  rows  (determined  by  the  CRT  correspondence  between  Z*n  and  YIfZmeX  we  have 

a(pT)  =  CRTm, 

i.e.,  the  columns  of  CRT,,,  are  o (jpj )  for  each  entry  p:j  of  the  row  vector  pT .  In  particular,  o((c,p))  = 
CRTm  •  c  for  any  c  £  OZ  Similarly,  o'  {(p')T)  =  CRTm/  up  to  a  row  permutation. 

We  now  claim  that,  up  to  some  permutations  of  B’ s  rows  and  columns, 

B  =  CRTm  •  (CRTm|  (8>  In/n /)  =  (^)^CRTm£  ■  (CRTm}  0  ,  (2-7) 

where  the  second  equality  follows  by  the  mixed-product  property  and  the  commutativity  (up  to  row  and 
column  permutations)  of  the  Kronecker  product.  To  see  the  first  equality,  notice  that  for  any  a  £  n'> 

defining  a  =  (a,  b)  £  K,  the  matrix  (CRTmJ  0  I )  maps  from  (a  suitable  permutation  of)  the  concatenated 
embeddings  a' (a),  to  a  vector  c  £  Zn  of  coefficients  such  that  a  =  ( c,p 1  0  /,,/„/).  In  addition, 

a  =  (a,  b)  =  cT  ■  {p1  <g>  In/n> )  -  b  =  (c,p'  ®b)  =  (c,p). 

Therefore,  a(a)  =  CRTm  •  c  =  CRTm  •  (CRT“)  0  I)  ■  o' (a),  as  desired. 

Now,  by  the  last  expression  in  Equation  (|2.7[),  and  because  singular  values  are  multiplicative  under 
the  Kronecker  product,  from  now  on  we  drop  all  the  l  subscripts,  and  assume  without  loss  of  generality 
that  m  and  m'  are  powers  of  the  same  prime  p  (where  possibly  m!  =  1).  We  analyze  the  singular  values 
of  CRTrn(CRTjnl  0  /),  for  the  cases  m'  =  1  and  m'  >  1.  In  the  first  case,  clearly  CRT,n/  =  I\ ,  and  it  is 
shown  in  lfl6l  Section  4]  that  the  largest  singular  value  of  CRTm  is  \/m/2  if  m  is  even  and  yTn  otherwise, 
and  its  smallest  singular  value  is  \Jm/p. 

For  the  case  m!  >  1,  it  follows  from  the  decompositions  given  in  lfl6l  Section  3]  that,  up  to  some  row 
permutation, 

CRTm  =  yjm/p  ■  Q  ■  (CRTp  0  Im/p) 

for  some  unitary  matrix  Q,  and  similarly  for  CRTm/.  Then  a  routine  calculation  using  elementary  properties 
of  the  Kronecker  product  reveals  that  CRTm(CRT?~)J  0 1)  is  some  unitary  matrix  scaled  by  a  \Jrn/rn'  factor, 
so  all  its  singular  values  are  yj m/m' .  This  completes  the  proof  of  Lemma [2Ti| 

2.2  Homomorphic  Cryptosystems 

In  ring-LWE-based  cryptosystems  for  arbitrary  cyclotomics  llT6l  (generalizing  those  of  fl5l  4j  (3|),  the 
plaintext  space  is  Rp  for  some  integer  p  >  2  that  is  coprime  with  all  the  odd  primes  dividing  rn.  We  assume 
that  p  is  prime,  which  is  without  loss  of  generality  by  the  Chinese  Remainder  Theorem.  Ciphertexts  are 
elements  of  (Rp  )‘2  for  some  integer  q  that  is  coprime  with  p,  and  the  secret  key  is  some  *  £  II.  A  ciphertext 
c  =  (co,  ci)  £  (Rq)2  that  encrypts  a  plaintext  b  £  Rp  with  respect  to  s  satisfies  the  decryption  relation 

cq  +  ci  •  s  =  e  (mod  qRv)  (2.8) 
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for  some  sufficiently  short  e  G  RJ  such  that  t  ■  e  =  b  (mod  pR).  (Recall  that  R'/  =  t~  1  R  for  some  t  G  R, 
so  t  ■  e  G  R.)  We  refer  to  e  as  the  noise  of  the  ciphertext.  Throughout  this  work  we  implicitly  assume  that  the 
modulus  q  is  large  enough  relative  to  ||e||,  so  that  [co  +  ci  •  s]g  =  e  G  Rv  (see  Section  2.1.4  above).  Therefore, 
the  decryption  algorithm  can  simply  compute  e  and  output  t  ■  e  mod  pR.  As  shown  in  il4ll3l[l6l.  this  system 
(augmented  by  some  additional  public  values,  for  greater  efficiency)  supports  additive  and  multiplicative 
homomorphisms . 


3  The  Field-Switching  Procedure 

Our  procedure  performs  the  following  operation.  Given  a  big-field  ciphertext  c  G  (Rq)2  that  encrypts  a 
plaintext  b  G  Rp  with  respect  to  a  big-ring  secret  key  s  G  R,  and  a  description  of  an  /('-linear  function 
L :  Rp  — >  /Z'  to  apply  to  the  plaintext  (where  recall  that  p  and  p'  are  the  radicals  of  p  in  R  and  //',  respectively), 
it  outputs  a  small-field  ciphertext  c'  G  ( R'q )2  that  encrypts  b '  =  L(b)  G  R'p,  with  respect  to  some  small-ring 
secret  key  s'  G  R'.  (Recall  that  Corollary  |2.5| characterizes  how  L  corresponds  to  the  induced  function 
L :  ¥■?  — >  F'f  that  is  applied  to  the  vector  of  finite  field  elements  encoded  by  b.) 

The  procedure  consists  of  the  following  three  steps: 

1 .  Switch  to  a  small-ring  secret  key.  We  use  the  key-switching  method  from  |5]  j3j  |T6  ]  to  produce  a 
ciphertext  which  is  still  over  the  big  field  K  and  encrypts  the  same  plaintext  b  G  Rv,  but  with  respect 
to  a  secret  key  s'  G  R!  C  R  belonging  to  the  small  subring. 

2.  Multiply  by  an  appropriate  (short)  scalar.  We  multiply  the  components  of  the  resulting  ciphertext 
by  a  short  element  r  G  R  that  corresponds  to  the  desired  //'-linear  function  to  be  applied  to  the  input 
plaintext  b. 

3.  Map  to  the  small  field.  We  map  the  resulting  big-field  ciphertext  (over  Rq)  to  a  small-field  ciphertext 
(over  R'q)  simply  by  taking  the  trace  Tr  K/Ki  of  its  two  components.  The  resulting  ciphertext  will  still 
be  with  respect  to  the  small-ring  secret  key  s'  G  R',  but  will  encrypt  the  plaintext  b'  =  L(b )  G  R'pl. 

Note  that  Steps  [2] and [3] can  be  repeated  multiple  times  on  the  same  ciphertext  (from  Step |T|).  to  apply 
several  different  //'-linear  functions.  In  this  way,  the  entire  input  plaintext  can  be  preserved,  but  in  a 
decomposed  form. 


3.1  Step  1:  Switching  to  a  Small-Ring  Secret  Key 

To  switch  to  a  small-field  secret  key,  we  publish  a  “key-switching  hint,”  which  essentially  encrypts  the 
big-ring  secret  key  s  G  R  under  the  small-ring  key  s'  G  //',  using  ciphertexts  over  the  big  field.  Note  that 
encrypting  s  under  a  small-ring  secret  key  s'  has  security  implications,  since  the  dimension  of  the  underlying 
RLWE  problem  is  smaller.  In  our  case,  though,  the  ultimate  goal  is  to  switch  to  a  ciphertext  over  the  smaller 
field,  so  we  will  not  lose  any  additional  security  by  publishing  the  hint.  Indeed,  we  show  below  that  assuming 
the  hardness  of  the  decision  RLWE  problem  in  the  small  field,  the  key-switching  hint  reveals  nothing  about 
the  big-ring  secret  key.  The  essence  of  that  claim  is  Lemma  3. 1  below,  which  says  (informally)  that  RLWE  in 
the  big  field,  with  secret  chosen  in  the  small  ring  R'  C  R,  is  no  easier  than  RLWE  in  the  small  field. 


Ring-LWE.  The  ring-LWE  (RLWE)  problem  lfl5l  (in  K)  with  continuous  error  is  parameterized  by  a 
modulus  q,  a  “secret  distribution”  v  over  R,  and  an  “error  distribution”  R  over  K,  which  is  usually  a 
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Gaussian  (in  the  canonical  embedding)  and  is  therefore  concentrated  on  short  elements]^]  For  s  €  R,  define 
the  distribution  As ^  that  is  sampled  by  choosing  a  £  Ry  uniformly  at  random,  choosing  e  <—  ip,  and 
outputting  the  pair  (a,  f3  =  a  ■  s  +  e  mod  q It' )  £  Ry  x  K/qRf .  One  equivalent  form  of  the  (average-case) 
decision  RLWEg problem  (in  K )  is,  given  some  i  pairs  (a,.,  (if)  £  Rq  x  I\/qRf ,  distinguish  between  the 
following  two  cases:  in  one  case,  the  pairs  are  chosen  independently  from  As  ^  for  a  random  s  <—  v  (which 
remains  the  same  for  all  samples);  in  the  other  case,  the  pairs  are  all  independent  and  uniformly  random  over 
Rq  x  K/qRf .  For  appropriate  parameters  q,  ip,  v  and  i,  solving  this  decision  problem  with  non-negligible 
distinguishing  advantage  is  as  hard  as  approximating  the  shortest  vector  problem  on  ideal  lattices  in  R,  via  a 
quantum  reduction.  See  ns  ei  for  precise  statements  and  further  details. 

Let  bv  =  (lij  ')j(z\n/nr  be  any  /?,v-basis  of  Rf ,  and  hence  a  /G-basis  of  K.  Then  for  any  error  distribu¬ 
tion  ip'  over  K' ,  we  can  define  an  error  distribution  ip  over  K  as  ip  =  (yj,(n/n') .  b  y),  i.e.,  a  sample  from  ip  is 
generated  by  choosing  independent  ej  ip'  (for  j  £  \n/n'])  and  outputting  e  =  ffj  ejtij  F  K- 

Lemma  3.1.  Let  ip'  be  an  error  distribution  over  K',  and  let  ip  =  (i p'(nln'\  by)  be  the  error  distribution 
over  K  as  described  above.  If  the  decision  RLWEq/lj/vr  problem  (in  K')  is  hard  for  some  distribution  v' 
over  R'  C  R,  then  the  decision  RLWEq^y  problem  ( in  K )  is  also  hard. 


Although  the  lemma  holds  for  any  R'y  -basis  of  ItJ ,  it  is  most  useful  with  a  basis  having  “good  geometric 
properties.”  Specifically,  in  our  case  we  need  the  property  that  if  P/  is  concentrated  on  short  elements  of 


2.6 


K' ,  then  f  is  similarly  concentrated  on  short  elements  of  K.  Such  a  basis  b  J  is  constructed  in  Lemma 
of  Section  2.1.5  For  example,  if  ip'  is  a  continuous  (spherical)  Gaussian  with  parameter  s  and  r  = 
rad (m)/ rad (m!)  =  1,  then  f  is  a  spherical  Gaussian  with  parameter  sy/m'/m  =  sy/n'/nf\ 


Proof.  It  suffices  to  give  an  efficient,  deterministic  reduction  that  takes  n/n'  pairs  (oj,  /f  )  £  R'q  x  K' / qRr/ 
and  outputs  a  single  pair  (a, /3)  £  ItJ  x  K/qRf ,  with  the  following  properties:  if  the  pairs  ( ctj,/3j )  are 
i.i.d.  according  to  As>^i  for  some  s'  £  R\  then  (a.  (3)  is  distributed  according  to  As^,\  and  if  the  pairs 
(aj,/3j)  are  independent  and  uniformly  random,  then  (a,/3)  is  uniformly  random.  The  reduction  simply 
outputs  (a  =  (a,  bv),/3  =  (/3,  b  v}),  where  a  =  ( otj)j  and  (5  =  ( /3j)j . 

Since  b  v  is  an  R'w  -basis  of  It'  and  hence  an  R!q  -basis  of  It' ,  it  is  immediate  that  the  reduction  maps 
the  uniform  distribution  to  the  uniform  distribution.  On  the  other  hand,  if  the  samples  (oijfij)  are  drawn  from 
Asi^i,  i.e,  f3j  =  OLj  ■  s'  +  €j  mod  qlt'  for  e:l  <—  then  a  is  still  uniformly  random,  and 


/3  =  0,  b  v)  =  (a,  bv)  ■  s'  +  (e,  b  v)  =  a  ■  s'  +  e  (mod  qRv ), 


where  e  =  (e:?)?  and  e  has  distribution  ip.  This  completes  the  proof. 


□ 


Key  switching.  In  [j5. 3  J_6|  it  is  shown  how,  given  an  .s  £  R  and  sufficiently  many  RLWE  samples  (over  K) 
with  short  noise  and  any  secret  s'  £  R,  it  is  possible  to  generate  a  “key-switching  hint”  with  the  following 
functionality:  given  the  hint  and  any  valid  ciphertext  c  (over  K)  encrypted  under  s  and  with  sufficiently  short 
noise,  it  is  possible  to  efficiently  generate  a  ciphertext  c'  (also  over  K)  with  short  noise  encrypted  under  s'. 
Moreover,  the  hint  is  indistinguishable  from  uniformly  random  over  its  domain  (even  given  s),  assuming  that 
the  RLWE  samples  are. 

7Again,  to  be  completely  formal,  a  Gaussian  should  be  defined  over  Kn;  see  Footnotejb] 

sNote  that  the  factor  \J n' /n  <  1  does  not  really  amount  to  any  effective  decrease  in  the  noise,  because  the  “sparsity”  of  R ,v 
versus  f?v  is  greater  by  a  corresponding  factor. 
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For  our  transformation,  we  apply  Lemma  3.1  using  the  “good  basis”  b  v  from  Lemma  2.6  thus  obtaining 


RLWE  samples  over  K  relative  to  the  secret  s'  €  R'  C  R,  with  noise  distribution  tp  which  is  concentrated  on 
short  vectors,  and  with  security  based  on  the  hardness  of  RLWEf/,;y,;/  problem  in  K'.  We  then  construct  the 
key-switching  hint  from  these  samples  as  described  in  llT6l  Section  8.3], 


3.2  Steps  2  and  3:  Mapping  to  the  Small  Field 


Our  goal  now  is  to  transform  a  valid  big-field  ciphertext  c  =  (co,  ci)  E  (Rq)2,  which  encrypts  some  b  E  If 
with  respect  to  some  secret  key  s'  €  R'  C  R,  into  a  small-field  ciphertext  d  =  (d0,  dx )  E  {R’q)2  that 
encrypts  the  related  plaintext  b'  =  L{b)  with  respect  to  the  same  secret  key  s',  where  L :  Rv  — >  R1 ,  is  any 
desired  .R'-linear  function. 

The  process  works  as  follows: 


1.  Since  L  is  if'-linear,  by  the  discussion  at  the  end  of  Section  2.1.3  and  in  Section  2.1.4  we  can  find 
some  rv  E  t'Rd  such  that  L(a)  =  Tr K/K'{ry  '  a)  mod  p'. 


2.  We  then  find  a  short  representative  r  E  ( t/t')rv  +  pR  E  Rp,  using  a  “good”  basis  of  pR  (i.e.,  one  that 
has  small  singular  values  under  a,  e.g.,  the  “powerful”  basis  as  constructed  in  Section [2. 1 .5[). 

The  chosen  r  defines  the  .R7 -linear  function  Lv  :  Rv  -»  R’v  of  the  form  Lv(ov)  =  TrK/K'{r  •  av), 
whose  induced  function  from  Rp  to  /?/pY  satisfies 

t'  ■  Lv (av)  =  L(t  ■  av)  (mod  p').  (3.1) 


3.  We  obtain  our  small -field  ciphertext  by  applying  Lv  (or  more  precisely,  the  induced  function  from  Rdq 
to  R'q)  to  cq,  ci,  setting 


c 


/ 

l 


Lv(Ci) 


T *K/K'  ( r  -Ci)  E  R'q  ,  i  =  0,1. 


Lemma  3.2.  The  ciphertext  d  =  (c'0,  c\ )  is  an  encryption  ofb'  =  L(b)  E  R'p,  under  secret  key  s'  E  R! , 
with  noise  e'  =  Tv(e)  E  i?/v  of  length  ||e,||  <  ||e||  •  ||t’||00  ■  s/n/nf,  where  e  is  the  noise  in  the  original 
ciphertext  c. 


We  note  that  the  factor  in  the  bound  on  j  e/ 1  does  not  actually  amount  to  any  effective  increase  in 

the  noise,  because  the  dimension  has  decreased  by  a  corresponding  factor,  and  hence  the  size  of  d  relative 
to  Rr  J  remains  the  same  as  that  of  e  relative  to  IVJ .  More  precisely,  the  original  ciphertext  c  decrypts  correctly 
if  q  >  2 \/r7 1 1 e 1 1 .  whereas  d  decrypts  correctly  if  q  >  2\/n'||e,||  (see  Section  2.1.41.  Therefore,  the  only 
practical  increase  in  the  noise  is  due  solely  to  ||r||  . 


Proof.  We  need  to  show  three  things:  that  He'll  is  bounded  as  claimed,  that  d{)  +  c\  ■  s  =  d  (mod  qRI  J),  and 
that  t'  ■  d  =  b'  =  L{b)  (mod  p'). 

1.  The  first  claim  follows  immediately  by  Corollary  |2.2| and  the  inequality  |r  •  e||  <  1 1 /■  1 1 •  ||e||. 


2.  For  the  second  claim,  recall  that  co  +  ci  •  s  =  e  (mod  qRd).  Then  because  the  induced  function 
Lv  :  R^  — >  is  ii'-linear  and  s'  E  R! ,  we  have 

Cq  +  d1  ■  s'  =  Lv(c0  +  ci  •  s')  =  Lv(e)  =  d  (mod  R'q). 


3.  For  the  last  claim,  because  t  ■  e  =  b  mod  pR  and  by  Equation  (|3.1[),  we  have 

t'  ■  d  =  t'  ■  Lv(e)  =  L(t  ■  e)  =  L(b)  (mod  p'). 


□ 
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3.3  Applying  the  Field-Switching  Procedure 


A  typical  application  of  the  field-switching  procedure  during  homomorphic  evaluation  of  some  circuit  will 
begin  with  a  big-held  ciphertext  that  encrypts  an  array  of  plaintext  values  in  the  subheld  F',  as  embedded 
in  F^]  The  above  procedure  is  then  applied  to  decompose  the  ciphertext  into  a  number  of  small-held 
ciphertexts,  each  encrypting  a  subset  of  the  plaintext  values.  Since  big-held  ciphertexts  have  room  for  / 
plaintext  elements,  but  small-held  ciphertexts  can  only  hold  f  elements,  we  may  need  up  to  //  f  small-held 
ciphertexts  to  hold  all  the  plaintext  values  that  we  are  interested  in.  That  is,  we  apply  our  held-switching 
transformation  using  the  /'-fold  concatenations  L{  of  the  F'-linear  selection  functions  L, :  — >■  F', 

i  G  [///'],  where  L;  just  selects  the  ith  value  (in  F'){^] 

Referring  to  Figure [l]for  an  example,  the  big-held  ciphertext  holds  (up  to)  six  plaintext  values,  and  each 
small-held  ciphertext  can  hold  two  values,  with  the  big-held  plaintext  “slots”  corresponding  to  pi ,  P15,  P22 
lying  over  the  small-held  plaintext  slot  of  p'1;  and  the  big-held  slots  corresponding  to  p3,  P17,  P31  lying  over 
the  small-held  plaintext  slot  of  p'3.  Then  we  can  produce  three  small-held  ciphertexts,  using  the  three  selection 
functions 

(.T1,Xi5,X22,  X3,X17,X31)  (Xi  ,  X3), 

(x1,X15,X22,  X3,X17,X31)  (X15,X17), 

(Xl,  £15,2:22,  2:3,  £l7,;r3l)  (x22  ,  £3l)- 
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Abstract 

Accelerating  the  development  of  a  practical  Fully 
Homomorphic  Encryption  (FHE)  scheme  is  the  goal  of  the 
DARPA  PROCEED  program.  For  the  past  year,  this 
program  has  had  as  its  focus  the  acceleration  of  various 
aspects  of  the  FHE  concept  toward  practical 
implementation  and  use.  FHE  would  be  a  game-changing 
technology  to  enable  secure,  general  computation  on 
encrypted  data,  e.g.,  on  untrusted  off-site  hardware. 
However,  FHE  will  still  require  several  orders  of  magnitude 
improvement  in  computation  before  it  will  be  practical  for 
widespread  use. 

Recent  theoretical  breakthroughs  demonstrated  the 
existence  of  FHE  schemes  [1,  2],  and  to  date  much  progress 
has  been  made  in  both  algorithmic  and  implementation 
improvements.  Specifically  our  contribution  to  the  Proceed 
program  has  been  the  development  of  FPGA  based 
hardware  primitives  to  accelerate  the  computation  on 
encrypted  data  using  FHE  based  on  lattice  techniques  [3]. 
Our  project,  SIPHER,  has  been  using  a  state  of  the  art  tool- 
chain  developed  by  Mathworks  to  implement  VHDL  code 
for  FPGA  circuits  directly  from  Simulink  models.  Our 
baseline  Homomorphic  Encryption  prototypes  are 
developed  directly  in  Matlab  using  the  fixed  point  toolbox 
to  perform  the  required  integer  arithmetic.  Constant 
improvements  in  algorithms  require  us  to  be  able  to  quickly 
implement  them  in  a  high  level  language  such  as  Matlab. 
We  reported  on  our  initial  results  at  HPEC  2011  [4].  In  the 
past  year,  increases  in  algorithm  complexity  have 
introduced  several  new  design  requirements  for  our  FPGA 
implementation.  This  report  presents  new  Simulink 
primitives  that  had  to  be  developed  to  deal  with  these  new 
requirements. 

A  review  of  Fully  and  Somewhat 
Homomorphic  Encryption 

Fully  Homomorphic  Encryption  (FHE)  holds  the  promise  to 
securely  run  arbitrary  computations  over  encrypted  data  on 
untrusted  computation  hosts  [2].  The  general  FHE  concept 
of  operations  is  that  sensitive  data  is  encrypted  with  a 
public  key,  then  sent  to  an  untrusted  computation  host, 
which  can  perform  arbitrary  computations  on  the  encrypted 
data  without  first  needing  to  decrypt  it.  It  has  been  shown 
to  be  theoretically  possible  to  evaluate  arbitrary  programs 
using  just  two  special  purpose  FHE  operations,  EvalAdd 
and  EvalMult,  which  at  the  simplest  level,  roughly 
correspond  to  bitwise  XOR  and  AND  gates  operating  on 
encrypted  bits.  A  sequence  of  these  operations  is  run 
against  the  encrypted  data,  resulting  in  an  encryption  of  the 
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output  of  the  original  program  run  on  the  unencrypted  data. 
This  encrypted  result  can  then  be  sent  back  to  the  original 
client,  who  decrypts  the  result  using  its  secret  key.  The 
encrypted  data  is  protected  at  all  times  with  reasonable 
security  guarantees  based  on  computational  hardness 
results. 

A  ‘Fully’  Homomorphic  Encryption  scheme  allows  and 
unlimited  number  of  these  Eval  operations  to  be  performed. 
All  known  FHE  schemes  are  based  on  computationally  hard 
stochastic  lattice  theory  problems,  which  add  some  noise 
with  each  operation  and  require  a  very  computationally 
expensive  “recryption”  operation  that  is  periodically  run  on 
intermediate  ciphertexts  to  keep  the  noise  at  a  level  that  still 
permits  decryption.  A  ‘Somewhat’  Homomorphic  scheme, 
on  the  other  hand,  supports  several  (but  not  unlimited) 
EvalMult  and  EvalAdd  operations  while  preserving  the 
correctness  of  decryption.  In  other  words,  SHE  can  schemes 
support  secure  computation  for  only  a  small  subset  of 
programs.  By  focusing  on  an  SHE  scheme,  we  can  direct 
our  research  towards  the  implementation  of  efficient 
hardware  primitives,  while  the  FHE  community  develops 
more  efficient  recryption  algorithms. 

Recent  Developments  in  the  SIPHER  SHE 
Scheme 

Our  current  SHE  scheme  relies  on  operations  that  are 
generally  inefficient  to  implement  on  standard  CPU 
architectures  (i.e.  modular  arithmetic  with  a  large  modulus). 
The  EvalAdd  and  EvalMult  operations  for  example  are 
element  wise  vector  adds  and  multiplies  taken  modulo  some 
particular  prime  integer  q.  These  are  trivial  to  express 
using  Matlab:  c  =  mod(a+b,  q)  and  c  =  mod(a.*b,  q). 

For  convenience  most  of  the  previously  published  SHE  and 
FHE  implementations  have  used  standard  tools  such  as  the 
GNU  Multiple  Precision  Arithmetic  Library  (GMP)  [5], 
which  enable  researchers  to  code  operations  using  very 
large  integers.  This  limits  their  focus  to  operations  on  CPUs 
and  does  not  allow  them  to  take  advantage  of  specialized 
parallel  computation  hardware  like  FPGAs  which  provide 
highly  cost-effective  parallelism.  Our  approach  to 
developing  the  FPGA  code  for  implementing  EvalAdd  and 
EvalMult  is  to  develop  arithmetic  circuits  that  will  achieve 
high  throughput  by  using  parallelism  and  pipelining  on  the 
FPGA. 

We  initially  develop  prototype  descriptions  in  Matlab  that 
we  re-implement  in  a  stream-oriented  hardware 
implementable  manner  in  Simulink.  The  results  of  the 
implementations  are  compared  to  verify  correctness.  A 
conversion  from  Simulink  to  VHDL  is  done  in  a  completely 
automated  fashion  using  Mathwork’s  HDL  coder.  This  tool 
chain  provides  us  the  means  to  develop  our  primitives, 
including  cyclic  VHDL  based  FPGA  prototyping,  much 
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faster  than  traditional  methods.  Some  examples  of 
efficiency  are: 

1.  The  Matlab  and  Simulink  Models  are  driven  with 
the  same  fixed  point  data  variables,  and  generate 
the  same  format  output,  simplifying  test  and 
comparison 

2.  The  bit  width  of  the  circuits  is  specified  at  compile 
time  by  specifying  the  bit  width  of  the  input  data. 
The  sizing  of  intermediate  mathematical 
operations  is  done  automatically  by  the  fixed  point 
toolbox.  Thus  many  of  the  same  models  can  be 
used  for  8  bit  or  64  bit  inputs. 

3.  The  resulting  VHDL  is  vendor  independent.  This 
allows  for  rapid  benchmarking  on  multiple 
architectures.  However,  hand  optimization  of 
VHDL  may  be  required  for  optimum  performance 
in  order  to  take  advantage  of  vendor  specific  IP. 

Implementing  fast  modulo  add,  subtract  and 
multiply  in  Simulink  for  HDL  generation 

Software  implementations  of  modulus  usually  use  some 
form  of  trial  division  to  determine  the  remainder  operation. 
Implementing  modulus  integers  with  large  numbers  of  bits 
in  an  efficient  manner  requires  the  use  of  special  numerical 
algorithms  that  have  been  developed,  such  as  the 
Montgomery  Reduction  [6].  These  algorithms  avoid 
division  by  q,  but  rather  scale  the  integers  so  that  many  of 
the  divisions  can  be  performed  by  a  power  of  2,  requiring 
only  simple  bit  shifts.  Our  SHE  requires  circuits  for  fast 
modulo  addition  and  multiplication  (to  directly  implement 
the  EvalAdd  and  EvalMult  mentioned  above).  In  addition, 
our  scheme  relies  heavily  on  the  Chinese  Remainder 
Transform  (CRT),  which  can  be  implemented  as  an 
EvalMult,  followed  by  an  FFT  [7]  that  uses  modulo  integer 
instead  of  complex  arithmetic.  The  implementation  of  the 
FFT  requires  us  to  perform  a  standard  radix  2  ‘Butterfly’ 
operation,  which  uses  one  addition,  one  subtraction  and  one 
multiply,  all  modulo  q.  Thus  we  need  to  implement  a 
modulo  subtraction  as  well  as  addition. 

Initially,  our  selection  of  lattice  based  HE  led  to  looking  at 
relatively  modest  sized  modulus,  on  the  order  of  twenty 
bits.  An  implementation  of  Montgomery  Reduction  based 
arithmetic  would  be  relatively  efficient,  requiring  hardware 

Internal  Nock  diagram  ot  RmgAdd 


multipliers  on  the  order  of  40  bits.  However,  later  research 
showed  that  for  any  reasonable  security  requirements  our 
SHE  scheme  would  need  0(64)  bits  for  our  modulus.  Our 
implementation  of  Montgomery  arithmetic  in  Simulink 
required  us  to  double  our  bit  width  to  represent  intermediate 
values  represented  in  Montgomery  form.  We  found  that 
there  is  an  intrinsic  limitation  of  128  bit  width  in  Simulink 
even  when  using  the  fixed  point  toolbox.  This  meant  that 
we  could  not  compile  our  multipliers  for  bit  widths  on  the 
order  of  64  bits. 

Additionally,  our  early  arithmetic  models  were  all  designed 
for  a  single  value  of  modulus  q  to  be  used  for  all  operations. 
During  the  development  of  our  SHE  scheme  we  found  that 
using  multiple  values  of  related  moduli  resulted  in  more 
efficient  implementations.  Thus  our  circuits  would  need  to 
operate  with  multiple  (but  not  unlimited)  values  of  q.  As  a 
response  to  this  we  eliminated  Montgomery  arithmetic  and 
take  a  simpler  approach  to  modulo  addition  and  subtraction. 

Figure  1  shows  the  Matlab  code  and  resulting  Simulink 
block  for  performing  a  streaming  EvalAdd  when  the  inputs 
are  constrained  to  be  less  than  a  given  modulus  q.  The 
model  can  operate  on  one  pair  of  inputs  every  clock  cycle. 
The  model  shown  does  not  have  any  additional  pipeline 
registers  for  simplicity,  but  they  can  be  added  to  the  model 
in  order  to  increase  the  maximum  clock  speed  of  the 
resulting  VHDL,  at  a  cost  of  additional  pipeline  stages.  In 
our  applications  we  expect  to  process  streams  of  input  on 
the  order  of  several  thousand  entries,  so  this  additional 
pipeline  latency  is  trivial. 

Figure  2  shows  the  Matlab  and  resulting  Simlink  block  for 
modulo  subtraction.  The  same  comments  about  pipelining 
the  circuit  apply. 

Modulo  multiplication  is  a  much  more  complicated 
operation,  even  if  the  input  multiplier  and  multiplicand  are 
bounded  by  q.  Furthermore,  we  determined  in  our  earlier 
work  that  the  VHDL  code  generated  by  Simulink  for  large 
multiplications  is  not  automatically  pipelined,  so  the 
resulting  multiplies  severely  restrict  the  resulting  clock 
rates  of  the  circuits.  To  address  these  two  constraints,  we 
adopted  a  recently  developed  interleaved  modular 
multiplication  based  on  a  generalized  Barrett  reduction  [8]. 
This  multiplier  has  the  following  properties: 

1)  Long  words  of  bit  length  L  can  be  represented  by  n 


c  -  a*b 


cgteq  *  |c>=q|; 
cgte2q  =  (c>=fq+q)|. 
c(cgte2q)=c|cgte2q)-q, 
c(cgteq)=  c(egteq)  *q; 


Figure  1:  Internal  Structure  of  Simulink  HDL  ready  Modulo  Add  primitive. 
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Figure  2:  Internal  Structure  of  Simulink  HDL  ready 
Modulo  Subtract  primitive. 


Figure  3:  Top  level  structure  of  Simulink  HDL  ready  two  stage 
Barrett  Modulo  Multiply  primitive. 


smaller  words  of  bit  length  S  (i.e.  four  16  bit 
words  to  represent  a  64  bit  modulus). 

2)  The  multiplication  is  performed  in  n  stages,  where 
each  stage  performs  one  modulo  multiplication 
that  is  L+S  bits  long.  The  stage  can  be  pipelined  to 
perform  one  modulo  multiply  per  clock  cycle. 

3)  Each  stage  has  a  Barrett  modulus  performed  on  the 
partial  product,  which  reduces  overall  bit  growth 
of  the  partial  products  to  L+S.  Each  stage  requires 
3  multiplies,  and  all  divisions  required  by  the 
Barrett  algorithm  are  implemented  as  simple  bit 
shifts. 

4)  One  circuit  can  support  multiple  moduli.  All 
parameters  that  are  specific  to  a  given  modulus  can 
be  stored  in  a  table  and  indexed. 

Figure  3  shows  the  structure  of  our  resulting  multiplier  for  a 
two  stage  operation  (i.e.  L  =  2S).  Figure  4  shows  the  model 
for  a  single  stage  in  the  pipeline.  All  stages  use  the  same 
model.  Again,  internal  pipelining  in  the  stage  is  not  shown. 

Implementing  fast  CRT  in  Simulink  for  HDL 


generation 

As  mentioned  earlier,  our  scheme  uses  the  CRT,  which 
relies  heavily  on  modulo  arithmetic.  We  have  developed  a 
Simulink  model  for  performing  a  fast  CRT,  based  on  the 
primitives  discussed  above.  We  implemented  one  of  the 
standard  pipeline  decimation  in  frequency  FFT 
architectures,  known  as  the  Radix  2,  Multiplath  Delay 
Commutator  [7].  The  fundamental  structure  of  the  model  is 
identical  for  a  complex  version  that  computes  the  standard 
FFT,  and  the  modulo  arithmetic  version  that  computes  the 
FFT  portion  of  our  CRT.  The  only  difference  is  in  the 
Simulink  Model  that  implements  the  radix  2  butterfly. 

Figure  3  shows  the  structure  of  this  pipelined  CRT.  The 
design  trades  off  area  for  processing  speed.  For  an  N  point 
transform,  log2(N)  radix  2  Butterflies  are  required  (though 
the  last  butterfly  does  not  require  multiplies).  Additionally, 
3/2N-2  delay  elements  are  required.  The  data  needs  to  be 
presented  to  the  circuit  in  two  parallel  streams,  and  the 
resulting  output  is  in  bit  reverse  order. 

We  are  currently  in  the  process  of  analyzing  the 
performance  of  this  circuit,  and  determining  the  size  CRT 
operation  that  can  be  fit  into  our  candidate  FPGA 
architecture.  Our  analysis  has  shown  that  for  high  security 
applications  we  may  need  to  perform  CRT  operations  on 
vectors  of  up  to  2 14  in  length.  For  such  large  vector  sizes,  an 
alternative  design  approach  may  be  necessary  in  order  to  fit 
the  circuit  within  the  FGPA. 

Interim  Results 

Our  presentation  will  include  examples  of  our  primitives 
coded  in  Matlab  and  Simulink  and  examples  of  VF1DL  code 
generated  by  the  F1DL  coder.  We  will  also  be  able  to  show 
timing  results  from  Modelsim  based  simulations  of  the 
resulting  code.,  as  well  as  actual  timings  using  a  Virtex  6  on 
the  Xilinx  ML605  evaluation  board 
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Abstract 

We  provide  the  first  constructions  of  identity-based  (injective)  trapdoor  functions.  Furthermore, 
they  are  lossy.  Constructions  are  given  both  with  pairings  (DLIN)  and  lattices  (LWE).  Our  lossy 
identity-based  trapdoor  functions  provide  an  automatic  way  to  realize,  in  the  identity-based  setting, 
many  functionalities  previously  known  only  in  the  public-key  setting.  In  particular  we  obtain  the  first 
deterministic  and  efficiently  searchable  IBE  schemes  and  the  first  hedged  IBE  schemes,  which  achieve 
best  possible  security  in  the  face  of  bad  randomness.  Underlying  our  constructs  is  a  new  definition, 
namely  partial  lossiness,  that  may  be  of  broader  interest. 
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1  Introduction 


A  trapdoor  function  F  specifies,  for  each  public  key  pk,  an  injective,  deterministic  map  Fpk  that  can  be 
inverted  given  an  associated  secret  key  (trapdoor).  The  most  basic  measure  of  security  is  one-wayness. 
The  canonical  example  is  RSA  [55]. 

Suppose  there  is  an  algorithm  that  generates  a  “fake”  public  key  pk*  such  that  Fpi .*  is  no  longer 
injective  but  has  image  much  smaller  than  its  domain  and,  moreover,  given  a  public  key,  you  can’t  tell 
whether  it  is  real  or  fake.  Peikert  and  Waters  |52|  call  such  a  TDF  lossy.  Intuitively,  Fp^  is  close  to  a 
function  F^*  that  provides  information-theoretic  security.  Lossiness  implies  one-wayness  [521. 

Lossy  TDFs  have  quickly  proven  to  be  a  powerful  tool.  Applications  include  IND-CCA  [52))  de¬ 
terministic  nsi.  hedged  [8]  and  selective-opening  secure  public-key  encryption  m •  Lossy  TDFs  can  be 
constructed  from  DDH  [52],  QR  [55],  DLIN  [55],  DBDH  [21],  LWE  [52]  and  HPS  (hash  proof  systems)  [10]. 
RSA  was  shown  in  [Uj  to  be  lossy  under  the  T-hiding  assumption  of  [20],  leading  to  the  first  proof  of 
security  of  RSA-OAEP  [13]  without  random  oracles. 

Lossy  TDFs  and  their  benefits  belong,  so  far,  to  the  realm  of  public-key  cryptography.  The  purpose 
of  this  paper  is  to  bring  them  to  identity-based  cryptography,  defining  and  constructing  identity-based 
TDFs  (IB-TDFs),  both  one-way  and  lossy.  We  see  this  as  having  two  motivations,  one  more  theoretical, 
the  other  more  applied,  yet  admittedly  both  foundational,  as  we  discuss  before  moving  further. 

Theoretical  angle.  Trapdoor  functions  are  the  primitive  that  began  public  key  cryptography  mm- 
Pub  lie- key  encryption  was  built  from  TDFs.  (Via  hardcore  bits.)  Lossy  TDFs  enabled  the  first  DDH 
and  lattice  (LWE)  based  TDFs  [52], 

It  is  striking  that  identity-based  cryptography  developed  entirely  differently.  The  first  realizations  of 
IBE  [2111301 158]  directly  used  randomization  and  were  neither  underlain  by,  nor  gave  rise  to,  any  IB-TDFs. 

We  ask  whether  this  asymmetry  between  the  public- key  and  identity-based  worlds  (TDFs  in  one 
but  not  the  other)  is  inherent.  This  seems  to  us  a  basic  question  about  the  nature  of  identity-based 
cryptography  that  is  worth  asking  and  answering. 

Application  angle.  Is  there  anything  here  but  idle  curiosity?  IBE  has  already  been  achieved  without 
IB-TDFs,  so  why  go  backwards  to  define  and  construct  the  latter?  The  answer  is  that  losssy  IB-TDFs 
enable  new  applications  that  we  do  not  know  how  to  get  in  other  ways. 

Stepping  back,  identity-based  cryptography  [55]  offers  several  advantages  over  its  public-key  coun¬ 
terpart.  Key  management  is  simplified  because  an  entity’s  identity  functions  as  their  public  key.  Key 
revocation  issues  that  plague  PKI  can  be  handled  in  alternative  ways,  for  example  by  using  identity+date 
as  the  key  under  which  to  encrypt  to  identity  [21],  There  is  thus  good  motivation  to  go  beyond  ba¬ 
sics  like  IBE  [21,  |30j  [58j,  17,  [IS;  [62],  36]  and  identity-based  signatures  [11]  [32]  to  provide  identity-based 
counterparts  of  other  public-key  primitives. 

Furthermore  we  would  like  to  do  this  in  a  systematic  rather  than  ad  hoc  way,  leading  us  to  seek 
tools  that  enable  the  transfer  of  multiple  functionalities  in  relatively  blackbox  ways.  The  applications  of 
lossiness  in  the  public- key  realm  suggest  that  lossy  IBTDFs  will  be  such  a  tool  also  in  the  identity-based 
realm.  As  evidence  we  apply  them  to  achieve  identity-based  deterministic  encryption  and  identity- 
based  hedged  encryption.  The  first,  the  counterpart  of  deterministic  public-key  encryption  mm,  allows 
efficiently  searchable  identity-based  encryption  of  database  entries  while  maintaining  the  maximal  possible 
privacy,  bringing  the  key-management  benefits  of  the  identity-based  setting  to  this  application.  The 
second,  counterpart  of  hedged  symmetric  and  public-key  encryption  1561183,  makes  IBE  as  resistant  as 
possible  in  the  face  of  low-quality  randomness,  which  is  important  given  the  widespread  deployment  of 
IBE  and  the  real  danger  of  bad-randomness  based  attacks  evidenced  by  the  ones  on  the  Sony  Playstation 
and  Debian  Linux.  We  hope  that  our  framework  will  facilitate  further  such  transfers. 

We  clarify  that  the  solutions  we  obtain  are  not  practical  but  they  show  that  the  security  goals  can  be 
achieved  in  principle,  which  was  not  at  all  clear  prior  to  our  work.  Allowed  random  oracles,  we  can  give 
solutions  that  are  much  more  efficient  and  even  practical. 


1 
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Contributions  in  brief.  We  define  IB-TDFs  and  two  associated  security  notions,  one-wayness  and 
lossiness,  showing  that  the  second  implies  the  first. 

The  first  wave  of  IBE  schemes  was  from  pairings  ED  ESI  HD  M  E3  El]  but  another  is  now  emerging 
from  lattices  [36,  29JIH  3J-  We  aim  accordingly  to  reach  our  ends  with  either  route  and  do  so  successfully. 
We  provide  lossy  IB-TDFs  from  a  standard  pairings  assumption,  namely  the  Decision  Linear  (DLIN) 
assumption  of  m-  We  also  provide  IB-TDFs  based  on  Learning  with  Errors  (LWE)  [53],  whose  hardness 
follows  from  the  worst-case  hardness  of  certain  lattice-related  problems  |53l  50].  (The  same  assumption 
underlies  lattice-based  IBE  [Ml  EH  EH3]  and  public-key  lossy  TDFs  [52].)  None  of  these  results  relies  on 
random  oracles. 

Existing  work  brought  us  closer  to  the  door  with  lattices,  where  one-way  IB-TDFs  can  be  built  by 
combining  ideas  from  [36,  29,  2j.  Based  on  techniques  from  [501115]  we  show  how  to  make  them  lossy. 
With  pairings,  however  it  was  unclear  how  to  even  get  a  one-way  IB-TDF,  let  alone  one  that  is  lossy.  We 
adapt  the  matrix-based  framework  of  [52]  so  that  by  populating  matrix  entries  with  ciphertexts  of  a  very 
special  kind  of  anonymous  IBE  scheme  it  becomes  possible  to  implicitly  specify  per-identity  matrices 
defining  the  function.  No  existing  anonymous  IBE  has  the  properties  we  need  but  we  build  one  that  does 
based  on  methods  of  [25].  Our  results  with  pairings  are  stronger  because  the  lossy  branches  are  universal 
hash  functions  which  is  important  for  applications. 

Public- key  lossy  TDFs  exist  aplenty  and  IBE  schemes  do  as  well.  It  is  natural  to  think  one  could 
easily  combine  them  to  get  IB-TDFs.  We  have  found  no  simple  way  to  do  this.  Ultimately  we  do  draw 
from  both  sources  for  techniques  but  our  approaches  are  intrusive.  Let  us  now  look  at  our  contributions 
in  more  detail. 

New  primitives  and  definitions.  Public  parameters  pars  and  an  associated  master  secret  key  having 
been  chosen,  an  IB-TDF  F  associates  to  any  identity  a  map  Fparsi(],  again  injective  and  deterministic, 
inversion  being  possible  given  a  secret  key  derivable  from  id  via  the  master  secret  key.  One-wayness  means 
Fpars,i.d*  is  hard  to  invert  on  random  inputs  for  an  adversary-specified  challenge  identity  id* .  Importantly, 
as  in  IBE,  this  must  hold  even  when  the  adversary  may  obtain,  via  a  key-derivation  oracle,  a  decryption 
key  for  any  non-challenge  identity  of  its  choice  [21] .  This  key-derivation  capability  contributes  significantly 
to  the  difficulty  of  realizing  the  primitive.  As  with  IBE,  security  may  be  selective  (the  adversary  must 
specify  id*  before  seeing  pars )  [28]  or  adaptive  (no  such  restriction)  [21]. 

The  most  direct  analog  of  the  definition  of  lossiness  from  the  public-key  setting  would  ask  that  there 
be  a  way  to  generate  “fake”  parameters  pars*,  indistinguishable  from  the  real  ones,  such  that  Fpars*  id*  is 
lossy  (has  image  smaller  than  domain).  In  the  selective  setting,  the  fake  parameter  generation  algorithm 
Pg*  can  take  id*  as  input,  making  the  goal  achievable  at  least  in  principle,  but  in  the  adaptive  setting  it 
is  impossible  to  achieve,  since,  with  id*  not  known  in  advance,  Pg*  is  forced  to  make  Fpars*l(j  lossy  for 
all  id ,  something  the  adversary  can  immediately  detect  using  its  key-derivation  oracle. 

We  ask  whether  there  is  an  adaptation  of  the  definition  of  lossiness  that  is  achievable  in  the  adaptive 
case  while  sufficing  for  applications.  Our  answer  is  a  definition  of  d-lossiness,  a  metric  of  partial  lossiness 
parameterized  by  the  probability  6  that  Fpars*  uj*  is  lossy.  The  definition  is  unusual,  involving  an  adversary 
advantage  that  is  the  difference,  not  of  two  probabilities  as  is  common  in  cryptographic  metrics,  but  of 
two  differently  weighted  ones.  We  will  achieve  selective  lossiness  with  degree  8  =  1,  but  in  the  adaptive 
case  the  best  possible  is  degree  1/ poly  with  the  polynomial  depending  on  the  number  of  key-derivation 
queries  of  the  adversary,  and  this  what  we  will  achieve.  We  show  that  lossiness  with  degree  8  implies 
one-wayness,  in  both  the  selective  and  adaptive  settings,  as  long  as  8  is  at  least  1/ poly. 

In  summary,  in  the  identity-based  setting  (ID)  there  are  two  notions  of  security,  one-wayness  (OW) 
and  lossiness  (LS),  each  of  which  could  be  selective  (S)  or  adaptive  (A),  giving  rise  to  four  kinds  of  IB- 
TDFs.  The  left  side  of  Figure  Q]  shows  how  they  relate  to  each  other  and  to  the  two  kinds  of  TDFs  -  OW 
and  LS  in  the  public- key  setting  (PK).  The  un-annotated  implications  are  trivial,  ID-LS-A  — >•  ID-LS-S 
meaning  that  d-lossiness  of  the  first  type  implies  d-lossiness  of  the  other  for  all  d.  It  is  not  however  via 
this  implication  that  we  achieve  ID-LS-S,  for,  as  the  table  shows,  we  achieve  it  with  degree  higher  than 
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ID-LS-A 


Them 


ID-OW-A 


ID-LS-S 


Them 


ID-OW-S 


PK-LS 


Primitive 

<5 

Achieved  under 

ID-LS-A 

1/poly 

DLIN,  LWE 

ID-LS-S 

1 

DLIN,  LWE 

PK-OW 


Figure  1:  Types  of  TDFs  based  on  setting  (PK=Public-key,  ID=identity-based),  security  (OW=one-way,  LS=loss) 
and  whether  the  latter  is  selective  (S)  or  adaptive  (A).  An  arrow  A  — >  B  in  the  diagram  on  the  left  means  that 
TDF  of  type  B  is  implied  by  (can  be  constructed  from)  TDF  of  type  A.  Boxed  TDFs  are  the  ones  we  define  and 
construct.  The  table  on  the  right  shows  the  6  for  which  we  prove  5-lossiness  and  the  assumptions  used.  In  both 
the  S  and  A  settings  the  S  we  achieve  is  best  possible  and  suffices  for  applications. 


ID-LS-A. 

Closer  Look.  One’s  first  attempt  may  be  to  build  an  IB-TDF  from  an  IBE  scheme.  In  the  random 
oracle  (RO)  model,  this  can  be  done  by  a  method  of  [9],  namely  specify  the  coins  for  the  IBE  scheme  by 
hashing  the  message  with  the  RO.  It  is  entirely  unclear  how  to  turn  this  into  a  standard  model  construct 
and  it  is  also  unclear  how  to  make  it  lossy. 

To  build  ID-TDFs  from  lattices  we  consider  starting  from  the  public- key  TDF  of  [52]  (which  is  already 
lossy)  and  trying  to  make  it  identity-based,  but  it  is  unclear  how  to  do  this.  However,  Gentry,  Peikert  and 
Vaikuntanathan  (GPV)  |36j  showed  that  the  function  :  B™+m  — >  Z™  defined  by  5a(x,  e)  =  A7  •  x  +  e 
is  a  TDF  for  appropriate  choices  of  the  domain  and  parameters,  where  matrix  A  e  Z”xm  is  a  uniformly 
random  public  key  which  is  constructed  together  with  a  trapdoor  as  for  example  in  [2  E)  [M].  We 
make  this  function  identity-based  using  the  trapdoor  extension  and  delegation  methods  introduced  by 
Cash,  Hofheinz,  Kiltz  and  Peikert  |29|.  and  improved  in  efficiency  by  Agrawal,  Boneh  and  Boyen  [2J  and 
Micciancio  and  Peikert  [ffij.  Finally,  we  obtain  a  lossy  IB-TDF  by  showing  that  this  construction  is 
already  lossy. 

With  pairings  there  is  no  immediate  way  to  get  an  IB-TDF  that  is  even  one-way,  let  alone  lossy.  We 
aim  for  the  latter,  there  being  no  obviously  simpler  way  to  get  the  former.  In  the  selective  case  we  need 
to  ensure  that  the  function  is  lossy  on  the  challenge  identity  id*  yet  injective  on  others,  this  setup  being 
indistinguishable  from  the  one  where  the  function  is  always  injective.  Whereas  the  matrix  diagonals  in 
the  construction  of  [52J  consisted  of  ElGamal  ciphertexts,  in  ours  they  are  ciphertexts  for  identity  id* 
under  an  anonymous  IBE  scheme,  the  salient  property  being  that  the  “anonymity”  property  should  hide 
whether  the  underlying  ciphertext  is  to  id*  or  is  a  random  group  element.  Existing  anonymous  IBE 
schemes,  in  particular  that  of  Boyen  and  Waters  (BW)  |23j.  are  not  conducive  and  we  create  a  new  one. 
A  side  benefit  is  a  new  anonymous  IBE  scheme  with  ciphertexts  and  private  keys  having  one  less  group 
element  than  BW  but  still  proven  secure  under  DLIN. 

A  method  of  Boneh  and  Boyen  m  can  be  applied  to  turn  selective  into  adaptive  security  but  the 
reduction  incurs  a  factor  that  is  equal  to  the  size  of  the  identity  space  and  thus  ultimately  exponential 
in  the  security  parameter,  so  that  adaptive  security  according  to  the  standard  asymptotic  convention 
would  not  have  been  achieved.  To  achieve  it,  we  want  to  be  able  to  “program”  the  public  parameters 
so  that  they  will  be  lossy  on  about  a  1  /Q  fraction  of  “random-ish”  identities,  where  Q  is  the  number  of 
key-derivation  queries  made  by  the  attacker.  Ideally,  with  probability  around  1  /Q  all  of  (a  successful) 
attacker’s  queries  will  land  outside  the  lossy  identity-space,  but  the  challenge  identity  will  land  inside  it 
so  that  we  achieve  d-lossiness  with  6  around  1  /Q. 

This  sounds  similar  to  the  approach  of  Waters  1(12]  for  achieving  adaptively  secure  IBE  but  there  are 
some  important  distinctions,  most  notably  that  the  technique  of  Waters  is  information-theoretic  while 
ours  is  of  necessity  computational,  relying  on  the  DLIN  assumption.  In  the  reduction  used  by  Waters  the 
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partitioning  of  the  identities  into  two  classes  was  based  solely  on  the  reduction  algorithm’s  internal  view 
of  the  public  parameters;  the  parameters  themselves  were  distributed  independently  of  this  partitioning 
and  thus  the  adversary  view  was  the  same  as  in  a  normal  setup.  In  contrast,  the  partitioning  in  our 
scheme  will  actually  directly  affect  the  parameters  and  how  the  system  behaves.  This  is  why  we  must 
rely  on  a  computational  assumption  to  show  that  the  partitioning  in  undetectable.  A  key  novel  feature 
of  our  construction  is  the  introduction  of  a  system  that  will  produce  lossy  public  parameters  for  about  a 
1  /Q  fraction  of  the  identities. 

Applications.  Deterministic  PKE  is  a  TDF  providing  the  best  possible  privacy  subject  to  being  deter¬ 
ministic,  a  notion  called  PRIV  that  is  much  stronger  than  one-wayness  [7j.  An  application  is  encryption 
of  database  records  in  a  way  that  permits  logarithmic-time  search,  improving  upon  the  linear-time  search 
of  PEKS  [20].  Boldyreva,  Fehr  and  O’Neill  [16]  show  that  lossy  TDFs  whose  lossy  branch  is  a  universal 
hash  (called  universal  lossy  TDFs)  achieve  (via  the  LHL  [15j  39] )  PRIV-security  for  message  sequences 
which  are  blocksources,  meaning  each  message  has  some  min-entropy  even  given  the  previous  ones,  which 
remains  the  best  result  without  ROs.  Deterministic  IBE  and  the  resulting  efficiently-searchable  IBE  are 
attractive  due  to  the  key-management  benefits.  We  can  achieve  them  because  our  DLIN-based  lossy 
IB-TDFs  are  also  universal  lossy.  (This  is  not  true,  so  far,  for  our  LWE  based  IB-TDFs.) 

To  provide  IND-CPA  security  in  practice,  IBE  relies  crucially  on  the  availability  of  fresh,  high-quality 
randomness.  This  is  fine  in  theory  but  in  practice  RNGs  (random  number  generators)  fail  due  to  poor 
entropy  gathering  or  bugs,  leading  to  prominent  security  breaches  [33  EH  [251  Hamsun  [631133].  Expecting 
systems  to  do  a  better  job  is  unrealistic.  Hedged  encryption  [8]  takes  poor  randomness  as  a  fact  of  life 
and  aims  to  deliver  best  possible  security  in  the  face  of  it,  providing  privacy  as  long  as  the  message 
together  with  the  “randomness”  have  some  min-entropy.  Hedged  PKE  was  achieved  in  [5]  by  combining 
IND-CPA  PKE  with  universal  lossy  TDFs.  We  can  adapt  this  to  IBE  and  combine  existing  (randomized) 
IBE  schemes  with  our  DLIN-based  universal  lossy  IB-TDFs  to  achieved  hedged  IBE.  This  is  attractive 
given  the  widespread  use  of  IBE  in  practice  and  the  real  danger  of  randomness  failures. 

Both  applications  are  for  the  case  of  selective  security.  We  do  not  achieve  them  in  the  adaptive  case. 

Related  Work.  A  number  of  papers  have  studied  security  notions  of  trapdoor  functions  beyond 
traditional  one-wayness.  Besides  lossiness  [52]  there  is  Rosen  and  Segev’s  notion  of  correlated-product 
security  [S3,  and  Canetti  and  Dakdouk’s  extractable  trapdoor  functions  m •  The  notion  of  adaptive 
one-wayness  for  tag-based  trapdoor  functions  from  Kiltz,  Mohassel  and  O’Neill  [43j  can  be  seen  as  the 
special  case  of  our  selective  IB-TDF  in  which  the  adversary  is  denied  key-derivation  queries.  Security  in 
the  face  of  these  queries  was  one  of  the  main  difficulties  we  faced  in  realizing  IB-TDFs. 

Organization.  We  define  IB-TDFs,  one-wayness  and  3-lossiness  in  Section  [2]  We  also  define  extended 
IB-TDFs,  an  abstraction  that  will  allow  us  to  unify  and  shorten  the  analyses  for  the  selective  and  adaptive 
security  cases.  In  Section  [3]  we  show  that  <5-lossiness  implies  one-wayness  as  long  as  5  is  at  least  1/ poly. 
This  allows  us  to  focus  on  achieving  3-lossiness.  In  Section  [3]  we  provide  our  pairing-based  schemes  and 
in  Appendix  [5]  our  lattice-based  schemes.  In  Appendix  [B]  we  sketch  how  to  apply  <5-lossy  IB-TDFs  to 
achieve  deterministic  and  hedged  IBE. 

Subsequent  WORK.  Escala,  Herranz,  Libert  and  Rafols  [33]  provide  an  alternative  definition  of  partial 
lossiness  based  on  which  they  achieve  deterministic,  PRIV-secure  IBE  for  blocksources,  and  hedged  IBE, 
in  the  adaptive  case,  which  answers  an  open  question  from  our  work.  They  also  define  and  construct 
hierarchical  identity-based  (lossy)  trapdoor  functions. 


2  Definitions 

Notation  and  conventions.  If  x  is  a  vector  then  |x|  denotes  the  number  of  its  coordiates  and  x[i] 
denotes  its  i-th  coordinate.  Coordinates  may  be  numbered  1, . . . ,  |x|  or  0, ... ,  |x|  —  1  as  convenient.  A 
string  x  is  identified  with  a  vector  over  {0, 1}  so  that  |x|  denotes  its  length  and  x[i\  its  i-th  bit.  The 
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proc  Initialize(id)  //  OWF,RealF 

(pars,  msk )  A  F.Pg  ;  IS  4—  0  ;  id*  id 
Return  pars 

proc  GetDK(id)  //OWF,RealF 

IS  ^  IS  U  {id} 

dk  G-  F.Kg (pars,  msk,  id) 

Return  dk 

proc  Ch(zd)  //  OWf 

id*  «—  id  ;  x  G-  InSp 
y  G-  F.E y(pars,  id*  ,x) 

Return  y 

proc  Finalize)#')  /  OWf 

Return  ((#'  =  x)  and  (id*  0  IS)) 


proc  Initialize(id)  //  LossyF  lf.£ 

(pars,  msk)  G-  LF.Pg(id)  ;  IS  G-  0  ;  id*  G-  id 
Return  pars 

proc  GetDK(id)  /  LossyF,i_F,£ 

IS  <-  ISu{id} 

dk  G-  LF.Kg(pars,  msk,  id) 

Return  dk 

proc  Ch(id)  /  RealF,LossyF  lf  i 

id*  G-  id 

proc  Finalize(d')  //  Realp 
Return  ((d!  =  1)  and  (id*  0  IS)) 
proc  Finalize(d')  //  LossyF  LF ^ 

Return  ((d'  =  1)  and  (id*  0  IS)  and  (A(F.E  v  (pars,  id*,-))  >  £)) 


Figure  2:  Games  defining  one-wayness  and  <5-lossiness  of  IBTDF  F  with  associated  sibling  LF. 


empty  string  is  denoted  e.  If  S'  is  a  set  then  |S|  denotes  its  size,  Sa  denotes  the  set  of  a- vectors  over  S, 
Saxb  denotes  the  set  of  a  by  b  matrices  with  entries  in  S,  and  so  on.  The  (i,j)~ th  entry  of  a  2  dimensional 
matrix  M  is  denoted  M[i,j]  and  the  (i,j,k)- th  entry  of  a  3  dimensional  matrix  M  is  denoted  M[i,j,  k]. 
If  M  is  a  n  by  fi  matrix  then  M[j,  •]  denotes  the  vector  (M[),  1], . . .  ,  M[j,  /a]).  If  a  =  (a\,...,an) 
then  (a\, . . .  ,an)  G-  a  means  we  parse  a  as  shown.  Unless  otherwise  indicated,  an  algorithm  may  be 
randomized.  By  y  G-  A(x i,  22,  • . .)  we  denote  the  operation  of  running  A  on  inputs  xi,X2,  ■  ■  ■  and  fresh 
coins  and  letting  y  denote  the  output.  We  denote  by  [A(x i,X2,  ■  ■ .)]  the  set  of  all  possible  outputs  of  A 
on  inputs  x\,  X2,  ■  ■  The  (Kronecker)  delta  function  A  is  defined  by  A  (a,  b)  =  1  if  a  =  b  and  0  otherwise. 
If  a,  b  are  equal-length  vectors  of  reals  then  (a,  b)  =  a[l]6[l]  +  •  •  •  +a[|a|]6[|6|]  denotes  their  inner  product. 

Games.  A  game  — look  at  Figure[2]for  an  example —  has  an  Initialize  procedure,  procedures  to  respond 
to  adversary  oracle  queries,  and  a  Finalize  procedure.  To  execute  a  game  G  is  executed  with  an  adversary 
A  means  to  run  the  adversary  and  answer  its  oracle  queries  by  the  corresponding  procedures  of  G.  The 
adversary  must  make  exactly  one  query  to  Initialize,  this  being  its  first  oracle  query.  (This  means  the 
adversary  can  give  Initialize  an  input,  an  extension  of  the  usual  convention  M-)  It  must  make  exactly 
one  query  to  Finalize,  this  being  its  last  oracle  query.  The  reply  to  this  query,  denoted  G"4,  is  called  the 
output  of  the  game,  and  we  let  “G"4”  denote  the  event  that  this  game  output  takes  value  true.  Boolean 
flags  are  assumed  initialized  to  false. 

IBTDFs.  An  identity-based  trapdoor  function  (IBTDF)  is  a  tuple  F  =  (F.Pg,  F.Kg,  F.Ev,  F.Ev-1)  of 
algorithms  with  associated  input  space  InSp  and  identity  space  IDSp.  The  parameter  generation  algorithm 
F.Pg  takes  no  input  and  returns  common  parameters  pars  and  a  master  secret  key  msk.  On  input 
pars,  msk,  id,  the  key  generation  algorithm  F.Kg  produces  a  decryption  key  dk  for  identity  id.  For  any 
pars  and  id  G  IDSp,  the  deterministic  evaluation  algorithm  F.Ev  defines  a  function  F.Ev  (pars,  id,-)  with 
domain  InSp.  We  require  correct  inversion :  For  any  pars,  any  id  G  IDSp  and  any  dk  G  [F.Kg (pars,  id)\, 
the  deterministic  inversion  algorithm  F.Ev-1  defines  a  function  that  is  the  inverse  of  F.E m (pars,  id,-), 
meaning  F.Ev-1(pars,  id,  dk,  F.E v(pars,  id,x))  =  x  for  all  x  G  InSp. 

E-IBTDF.  To  unify  and  shorten  the  selective  and  adaptive  cases  of  our  analyses  it  is  useful  to  define 
and  specify  a  more  general  primitive.  An  extended  IBTDF  (E-IBTDF)  E  =  (E.Pg,  E.Kg,  E.Ev,  E.Ev-1) 
consists  of  four  algorithms  that  are  just  like  the  ones  for  an  IBTDF  except  that  F.Pg  takes  an  additional 
auxiliary  input  from  an  auxiliary  input  space  AxSp.  Fixing  a  particular  auxiliary  input  aux  G  AxSp  for 
F.Pg  results  in  an  IBTDF  scheme  that  we  denote  E (aux)  and  call  the  IBTDF  induced  by  aux.  Not  all 
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these  induced  schemes  need,  however,  satisfy  the  correct  inversion  requirement.  If  the  one  induced  by 
aux  does,  we  say  that  aux  grants  invertibility.  Looking  ahead  we  will  build  an  E-IBTDF  and  then  obtain 
our  IBTDF  as  the  one  induced  by  a  particular  auxiliary  input,  the  other  induced  schemes  being  the  basis 
of  the  siblings  and  being  used  in  the  proof. 

One-wayness.  One-wayness  of  IBTDF  F  =  (F.Pg,  F. Kg,  F.Ev,  F.Ev”1)  is  defined  via  game  OWF  of 
Figure  [21  The  adversary  is  allowed  only  one  query  to  its  challenge  oracle  Ch.  The  advantage  of  such  an 
adversary  I  is  AdvFw(I)  =  Pr  [  OWp  ] . 

Selective  versus  adaptive  ID.  We  are  interested  in  both  these  variants  for  all  the  notions  we  consider. 
To  avoid  a  proliferation  of  similar  definitions,  we  capture  the  variants  instead  via  different  adversary 
classes  relative  to  the  same  game.  To  exemplify,  consider  game  OWp  of  Figure  [21  Say  that  an  adversary 
A  is  selective-id  if  the  identity  id  in  its  queries  to  Initialize  and  Ch  is  always  the  same,  and  say  it 
is  adaptive-id  if  this  is  not  necessarily  true.  Selective-id  security  for  one-wayness  is  thus  captured  by 
restricting  attention  to  selective-id  adversaries  and  full  (adaptive-id)  security  by  allowing  adaptive-id 
adversaries.  Now,  adopt  the  same  definitions  of  selective  and  adaptive  adversaries  relative  to  any  game 
that  provides  procedures  called  Initialize  and  Ch,  regardless  of  how  these  procedures  operate.  In  this 
way,  other  notions  we  will  introduce,  including  partial  lossiness  defined  via  games  also  in  Figure  [21  will 
automatically  have  selective-id  and  adaptive-id  security  versions. 

Partial  lossiness.  We  first  provide  the  formal  definitions  and  later  explain  them  and  their  relation 
to  standard  definitions.  If  /  is  a  function  with  domain  a  (non-empty)  set  Dom(/)  then  its  image  is 
Im(/)  =  {  f(x)  :  x  €  Dom(/)  }.  We  define  the  lossiness  A (/)  of  /  via 

Kf)  =  lg  or  equivalently  |Im(/)|  =  |Dom(/)|  •  2”A(/)  . 

We  say  that  /  is  Mossy  if  A (/)  >  l.  Let  IBTDF  F  =  (F.Pg,  F.Kg,  F.Ev,  F.Ev”1)  be  an  IBTDF  with 
associated  input  space  InSp  and  identity  space  IDSp.  A  sibling  for  F  is  an  E-IBTDF  LF  =  (LF.Pg,  LF.Kg, 
F.Ev,  F.Ev”1)  whose  evaluation  and  inversion  algorithms,  as  the  notation  indicates,  are  those  of  F  and 
whose  auxiliary  input  space  is  IDSp.  Algorithm  LF.Pg  will  use  this  input  in  the  selective-id  case  and 
ignore  it  in  the  adaptive-id  case.  Consider  games  RealF  and  LossyF  LF  t  of  Figure  [21  The  first  uses  the 
real  parameter  and  key-generation  algorithms  while  the  second  uses  the  sibling  ones.  A  los-adversary  A 
is  allowed  just  one  Ch  query,  and  the  games  do  no  more  than  record  the  challenge  identity  id*.  The 
advantage  of  the  adversary  is  not ,  as  usual,  the  difference  in  the  probabilities  that  the  games  return  true, 
but  is  instead  parameterized  by  a  probability  6  €  [0,  l]and  defined  via 

AdvRLF^O4)  =  &  ■  Pr  [  Real^  ]  -  Pr  [  Lossy£LF)<  ]  .  (1) 

Discussion.  The  PW  [52]  notion  of  lossy  TDFs  in  the  public-key  setting  asks  for  an  alternative  “sibling” 
key-generation  algorithm,  producing  a  public  key  but  no  secret  key,  such  that  two  conditions  hold.  The 
first,  which  is  combinatorial,  asks  that  the  functions  defined  by  sibling  keys  are  lossy.  The  second,  which  is 
computational,  asks  that  real  and  sibling  keys  are  indistinguishable.  The  first  change  for  the  IB  setting  is 
that  one  needs  an  alternative  parameter  generation  algorithm  which  produces  not  only  pars  but  a  master 
secret  key  msk,  and  an  alternative  key-generation  algorithm  that,  based  on  msk,  can  issue  decryption  keys 
to  users.  Now  we  would  like  to  ask  that  the  function  F.Ev  (pars,  id* ,  •)  be  lossy  on  the  challenge  identity 
id*  when  pars  is  generated  via  LF.Pg,  but,  in  the  adaptive-id  case,  we  do  not  know  id*  in  advance.  Thus 
the  requirement  is  made  via  the  games. 

We  would  like  to  define  the  advantage  normally,  meaning  with  5  =  1,  but  the  resulting  notion  is  not 
achievable  in  the  adaptive-id  case.  (This  can  be  shown  via  attack.)  With  the  relaxation,  a  low  (close 
to  zero)  advantage  means  that  the  probability  that  the  adversary  finds  a  lossy  identity  id*  and  then 
outputs  1  is  less  than  the  probability  that  it  merely  outputs  1  by  a  factor  not  much  less  than  5.  Roughly, 
it  means  that  a  <5  fraction  of  identities  are  lossy.  The  advantage  represents  the  computational  loss  while 
5  represents  a  necessary  information-theortic  loss. 
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IBE.  Recall  that  an  IBE  scheme  IBE  =  (IBE.Pg,  IBE.Kg,  IBE.Enc,  IBE.Dec)  is  a  tuple  of  algorithms 
with  associated  message  space  InSp  and  identity  space  IDSp.  The  parameter  generation  algorithm 
IBE.Pg  takes  no  input  and  returns  common  parameters  pars  and  a  master  secret  key  msk.  On  in¬ 
put  pars,msk,id,  the  key  generation  algorithm  IBE.Kg  produces  a  decryption  key  dk  for  identity  id. 
On  input  pars,  id  £  IDSp  and  a  message  M  £  InSp  the  encryption  algorithm  IBE.Enc  returns  a  ci¬ 
phertext.  The  decryption  algorithm  IBE.Dec  is  deterministic.  The  scheme  has  decryption  error  e  if 
Pr[IBE.Dec(pars,  id,  dk,  IBE.Enc(pars,  id,  M))  ^  M]  <  e  for  all  pars ,  all  id  £  IDSp,  all  dk  £  [F.Kg (pars,  id)] 
and  all  M  £  InSp.  We  say  that  IBE  is  deterministic  if  IBE.Enc  is  deterministic.  A  deterministic  IBE  scheme 
is  identical  to  an  IBTDF. 


3  Implications  of  Partial  Lossiness 


Theorem  13.21  shows  that  partial  lossiness  implies  one-wayness.  We  discuss  other  applications  in  Ap¬ 
pendix  [Bj  We  first  need  a  simple  lemma. 

Lemma  3.1  Let  f  be  a  function  with  non-empty  domain  Dom (/).  Then  for  any  adversary  A 

Pr [A(y)  =  x  :  x  £-  Dom (/)  ;  y  £-  f(x )]  <  2~x ^  .  | 


Proof  of  Lemma  13.11  For  y  £  Im (/)  let  f  1(y)  be  the  set  of  all  x  £  Dom(/)  such  that  /(x) 
probability  in  question  is 

i  I  f~\y)\  =  IM/)  | 

l/-1G/)l  lDom(/)l  |Dom(/)| 


5Z  Pr [ A(y)  =  x  I  /0)  =  y]  ■  Pr[/(s)  =  y]  < 


yElm(f) 


yElm(f) 


y.  The 

2 -A(/) 


where  the  probability  is  over  x  chosen  at  random  from  Dom(/)  and  the  coins  of  A  if  any.  (Since  A  is 
unbounded,  it  can  be  assumed  wlog  to  be  deterministic.)  | 


Theorem  3.2  [  d-lossiness  implies  one-wayness  ]  Let  F  =  (F.Pg,  F.Kg,  F.Ev,  F.Ev  x)  be  a  IBTDF  with 

associated  input  space  InSp.  Let  LF  =  (LF.Pg,  LF.Kg,  F.Ev,  F.Ev-1)  be  a  lossy  sibling  for  F.  Let  5  >  0  and 
let  £  >  0.  Then  for  any  ow- adversary  I  there  is  a  los- adversary  A  such  that 


Ad vpw(/)  < 


Adv^(A)  +  2-* 
S 


(2) 


The  running  time  of  A  is  that  if  I  plus  the  time  for  a  computation  of  F.Ev.  If  I  is  a  selective  adversary 
then  so  is  A.  | 


In  asymptotic  terms,  the  theorem  says  that  d-lossiness  implies  one-wayness  as  long  as  <5-1  is  bounded 
above  by  a  polynomial  in  the  security  parameter  and  £  is  super-logarithmic.  This  means  6  need  only  be 
non-negligible.  The  last  sentence  of  the  theorem,  saying  that  if  /  is  selective  then  so  is  A,  is  important 
because  it  says  that  the  theorem  covers  both  the  selective  and  adaptive  security  cases,  meaning  selective 
d-lossiness  implies  selective  one-wayness  and  adaptive  5-lossiness  implies  adaptive  one-wayness. 


Proof  of  Theorem  13.21  Adversary  A  runs  I.  When  /  makes  query  Initialize(id),  adversary  A  does 
the  same,  obtaining  pars  and  returning  this  to  I .  Adversary  A  answers  I’s  queries  to  its  GetDK  oracle 
via  its  own  oracle  of  the  same  name.  When  I  makes  its  (single)  Ch  query  id* ,  adversary  A  also  makes 
query  Ch  (id*).  Additionally,  it  picks  x  at  random  from  InSp  and  returns  y  =  F.E  \/(pars,  id* ,  x)  to  I. 
The  latter  eventually  halts  with  output  x' .  Adversary  A  returns  1  if  x'  =  x  and  0  otherwise.  By  design 
we  clearly  have  Pr  [Realj^]  =  Advpw(/).  But  game  LossyF  LF  ^  returns  true  only  if  F.E v(pars,  id* ,  ■)  is 
£- lossy,  in  which  case  the  probability  that  x  =  x'  is  small  by  Lemma  13.11  In  detail,  assuming  wlog  that  / 
never  queries  id*  to  GetDK,  we  have 

Pr  [Lossy F)LF^]  =  Pr  [  x  =  x'  \  A(F.Ev(pars,  id*,  •))  >  £]  ■  Pr  [  A(F.Ev(pars,  id*,  •))  >  £\ 

<  Pr  [  x  =  x'  |  A(F.E v (pars,  id*,  ■))>£]  <  2~e  , 
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the  last  inequality  by  Lemma  ITT!  applied  to  the  function  /  =  F.E v(pars,id* ,■).  From  Equation  (JTJ)  we 
have 

Advp‘[°p  ^(^4)  =  8  •  Pr  [RealF4]  -  Pr  [Lossy£LFi<]  >  5  •  Adv£w(I)  -  2“£  . 

Equation  f[2j)  follows.  |  In  Section  m  we  discuss  the  application  to  deterministic  and  hedged  IBE. 


4  IB-TDFs  from  pairings 

In  Section  [3]  we  show  that  d-lossiness  implies  one-wayness  in  both  the  selective  and  adaptive  cases.  We 
now  show  how  to  achieve  d-lossiness  using  pairings. 

Setup.  Throughout  we  fix  a  bilinear  map  e:  G  x  G  — >  G t  where  G,  G t  are  groups  of  prime  order  p. 
By  1,  It  we  denote  the  identity  elements  of  G,Gt,  respectively.  By  G*  =  G  —  {1}  we  denote  the  set  of 
generators  of  G.  The  advantage  of  a  dlin- adversary  B  is 

Advdlin(£)  =  2Pr[DLINB]  -  1  , 

where  game  DLIN  is  as  follows.  The  Initialize  procedure  picks  g,g  at  random  from  G*,  s  at  random 
from  Z*,  s  at  random  from  Zp  and  X  at  random  from  G.  It  picks  a  random  bit  b.  If  b  =  1  it  lets 
T  <—  Xs+s  and  otherwise  picks  T  at  random  from  G.  It  returns  (g,  g,  gs ,  gs ,  X,T)  to  the  adversary  B. 
The  adversary  outputs  a  bit  b'  and  Finalize,  given  b'  returns  true  if  b  =  b'  and  false  otherwise.  For 
integer  p  >  1,  vectors  U  £  G^1"1"1  and  y  £  Zp+1,  and  vector  id  £  Zp  we  let 

Jd  =  (l,id[l],...,id[p\)  £  Z£+1  and  ft(U,  id)  =  ■ 

T~L  is  the  BB  hash  function  H3  when  p  =  1,  and  the  Waters’  one  [S3]  when  IDSp  =  {0,1}^  and  an 
id  £  IDSp  is  viewed  as  a  p- vector  over  Zp.  We  also  let 

/( y,  id)  =  ik]id ik]  and  7(y> id )  =  /(y> id ) mod  p  ■ 


4.1  Overview 

In  the  Peikert- Waters  [52]  design,  the  matrix  entries  are  ciphertexts  of  an  underlying  homomorphic 
encryption  scheme,  and  the  function  output  is  a  vector  of  ciphertexts  of  the  same  scheme.  We  begin 
by  presenting  an  IBE  scheme,  that  we  call  the  basic  IBE  scheme,  such  that  the  function  outputs  of  our 
eventual  IB-TDF  will  be  a  vector  of  ciphertexts  of  this  IBE  scheme.  Towards  building  the  IB-TDF,  the 
first  difficulty  we  run  into  in  setting  up  the  matrix  is  that  ciphertexts  depend  on  the  identity  and  we 
cannot  have  a  different  matrix  for  every  identity.  Thus,  our  approach  is  more  intrusive.  We  will  have  many 
matrices  which  contain  certain  “atoms”  from  which,  given  an  identity,  one  can  reconstruct  ciphertexts 
of  the  IBE  scheme.  The  result  of  this  intrusive  approach  is  that  security  of  the  IB-TDF  relies  on  more 
than  security  of  the  base  IBE  scheme.  Our  ciphertext  pseudorandomness  lemma  (Lemma  14.11)  shows 
something  stronger,  namely  that  even  the  atoms  from  which  the  ciphertexts  are  created  look  random 
under  DLIN.  This  will  be  used  to  establish  Lemma  S3  which  moves  from  the  real  to  the  lossy  setup. 
The  heart  of  the  argument  is  the  proofs  of  the  lemmas,  which  are  in  the  appendices. 

We  introduce  a  general  framework  that  allows  us  to  treat  both  the  selective-id  and  adaptive-id  cases  in 
as  unified  a  way  as  possible.  We  will  first  specify  an  E-IBTDF.  The  selective-id  and  adaptive-id  IB-TDFs 
are  obtained  via  different  auxiliary  inputs.  Furthermore,  the  siblings  used  to  prove  lossiness  also  emanate 
from  this  E-IBTDF.  With  this  approach,  the  main  lemmas  become  usable  in  both  the  selective-id  and 
adaptive-id  cases  with  only  minor  adjustments  for  the  latter  due  to  artifical  aborts.  This  saves  us  from 
repeating  similar  arguments  and  significantly  compacts  the  proof. 
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4.2  Our  basic  IBE  scheme 

We  associate  to  any  integer  g  >  1  and  any  identity  space  IDSp  C  Z p  an  IBE  scheme  I BE[/x,  I DSp]  that 
has  message  space  {0, 1}  and  algorithms  as  follows: 

1.  Parameters:  Algorithm  I  BE[/x,  I  DSp] . Pg  lets  g  -G-  G*  ;  t  G-  Z*  ;  g  G-  gf .  It  then  lets  H,  H  G-  G ;  U,  U  G- 
GM+1.  It  returns  pars  =  ( g ,  g,  H,  H,  U,  U)  as  the  public  parameters  and  msk  =  t  as  the  master  secret 
key. 

2.  Key  generation:  Given  parameters  ( g,g ,  H,  H,  U,  U),  master  secret  t  and  identity  id  £  IDSp,  algorithm 
IBE[/x,  IDSp]. Kg  returns  decryption  key  (D\,  Z?2,  D3,  Z)4)  computed  by  letting  r,  f  G-  Zp  and  setting 

D1  +-  H( U,  id)tr  ■  Htf  ;  D2  G-  H{ U,  id)r  ■  Hf  ;  D3  «-  ;  T>4  G-  . 

3.  Encryption:  Given  parameters  (g,  g,  H ,  U,  U),  identity  id  £  IDSp  and  message  M  £  {0, 1},  algorithm 

IBE[/x,  IDSpJ.Enc  returns  ciphertext  (Ci,  C2,  C3,  C4)  computed  as  follows.  If  M  =  0  then  it  lets  s,  s  G- 
Zp  and  Ci  G-  gs ;  C2  G-  ;  C3  G-  ft(U,  id)*  -  *d)J ;  C4  -G-  H*HS.  If  M  =  1  it  lets  Ci,  C2,  C3,  C4  A 
G. 

4.  Decryption:  Given  parameters  (<?,<),  id, id,  U,  U),  identity  id  €  IDSp,  decryption  key  (D3,  D2,  D4,  D4) 
for  id  and  ciphertext  (C4,  C2,  C3,  C4),  algorithm  I BE[/x,  IDSp]. Dec  returns  0  if  e(C4,  Di)e(C2,  _D2) 
e(C3,  D3)e{C’4,  04)  =  1  t  and  1  otherwise. 

This  scheme  has  non-zero  decryption  error  (at  most  2/p)  yet  our  IBTDF  will  have  zero  inversion  error. 
This  scheme  turns  out  to  be  IND-CPA+ANON-CPA  although  we  will  not  need  this  in  what  follows. 
Instead  we  will  have  to  consider  a  distinguishing  game  related  to  this  IBE  scheme  and  our  IBTDF.  In 
Appendix  [A]  we  give  a  (more  natural)  variant  of  IBE[/x,  IDSp]  that  is  more  efficient  and  encrypts  strings 
rather  than  bits.  The  improved  IBE  scheme  can  still  be  proved  IND-CPA+ANON-CPA  but  it  cannot  be 
used  for  our  purpose  of  building  IB-TDFs. 


4.3  Our  E-IBTDF  and  IB-TDF 


Our  E-IBTDF  E [n,g,  IDSp]  is  associated  to  any  integers  n,  fi  >  1  and  any  identity  space  IDSp  C  Zp.  It 
has  message  space  {0,  l}n  and  auxiliary  input  space  Zp+1,  and  the  algorithms  are  as  follows: 


1.  Parameters:  Given  auxiliary  input  y,  algorithm  E[n,  fi.  IDSp].Pg  lets  f-G*. 

It  then  lets  H,  H  -G-  Gn  ;  V,  V  -G-  Gnx^+1^  and  s  +-  (Z*)n  ;  s  G-  Z”.  It  returns  pars  =  (g,g,  G, 
G,  J,  W,  H,  H,  V,  V,  U)  as  the  public  parameters  and  msk  =  t  as  the  master  secret  key  where  for 
1  <  i,  j  <  n  and  0  <  k  <  g: 


G[i]  g-  ;  G[t]  G-  gf®  ;  J[i,  j)  G-  H[i]s[i]H[jf  N  ;  W  [i,j,  k]  G-  V[j,  k]*®V\j,  k]^Us^A^  , 


where  we  recall  that  A (i,j)  =  1  if  i  =  j  and  0  otherwise  is  the  Kronecker  Delta  function. 

2.  Key  generation:  Given  parameters  (g,  g,  G,  G,  J,  W,  H,  H,  V,  V,  U),  master  secret  t  and  identity  id  £ 

IDSp,  algorithm  E  [n,g,  IDSp]. Kg  returns  decryption  key  (D4,  D2,  D3,  D4)  where  r  G-  ( Z*)n  ;  rf-Zj] 
and  for  1  <  i  <  n 

D4[i]  W(V[i,-],*d)trW  ■H[*]tfW  :  D2[«]  ^'H(V[i,-],id)rW  ■  H\if^  ;  D3[i]  G-  g~tr®  ;  D4[t]  G-  . 


3.  Evaluate:  Given  parameters  (g,  g,  G,  G,  J,  W,  H,  H,  V,  V,  U),  identity  id  £  IDSp  and  input  x  G 
{0,  l}n,  algorithm  E[n,  g,  IDSpJ.Ev  returns  (C4,  C2,  C3,  C4)  where  for  1  <  j  <  n 


ft  <-  n"=,G[rM ;  c2  <-  ;  c3m  <-  rnuntoWtM,*] 


a?[z]  icZ  [Aj] 


c4[j] 


nr=iJ[ 


JY 


4.  Invert:  Given  parameters  (g,  g,  G,  G,  J,  W,  H,  H,  V,  V,  U),  identity  id  G  IDSp,  decryption  key  (D4, 
D2,D3,D4)  for  id  and  output  (ciphertext)  (Ci,C2,  C3,C4),  algorithm  E[n,  g,  IDSp].Ev_1  returns  x  G 
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{0,  \}n  where  for  1  <  j  <  n  it  sets  x[j]  =  0  if  e(Ci,  Y>i\j])e(C2,  D2[j])e(C3[j],  D3[j])e(C4[j],  D4[j])  = 
It  and  1  otherwise. 

Invertibility.  We  observe  that  if  parameters  (g,  g,  G,  G,  J,  W,  H,  H,  V,  V,  U)  were  generated  with 
auxiliary  input  y  and  (Ci,  C2,  C3,  C4)  =  E [n,  g,  IDSp].Ev((g,  g,  G,  G,  J,  W),  id ,  x)  then  for  1  <  j  <  n 


C 1 
C2 
Cab'] 


C4[j] 


jjn  B[i]*[i]  =  g(s’x) 

=  g@’x) 

n’-illfe  0V[j,  k]s^x^id^\[j,  y[k]ui[k]A(i,j) 

nr=iW(Vb,  •],  id)s^n(v\j,  •],  idfWxWusWxWf( 

H(V\j,  •],  id){s’x)H(V[j,  -},id)^Us^x WfbM 

nr=iHb]"Wx[,1*bi,[,1*[i1  =  Hb]<B,x>Hb]<M> . 


(3) 

(4) 


(5) 

(6) 


Thus  if  x[j]  =  0  then  (C ,  C2 ■  C3 [j] .  C4 [j] )  is  an  encryption,  under  our  base  IBE  scheme,  of  the  mes¬ 
sage  0,  with  coins  (s,x)  mod]?,  (s,  x)  mod  p,  parameters  (g,  g,  H[j],  H[j],  V[j,  •],  V[j,  •])  and  identity  id. 
The  inversion  algorithm  will  thus  correctly  recover  x[j]  =  0.  On  the  other  hand  suppose  x[j]  = 
1.  Thene(Gi,D1[j])e(G2,D2b])e(C3b1,D3b])e(C4b],D4b])  =  e(UsWxW^d\V3[j}).  Now  suppose 
/( y,  id)  mod  p  7^  0.  Then  [/sL?MhTy:*c0  ^  l  because  we  chose  s[j]  to  be  non-zero  modulo  p  and  D3[j]  7^  1 
because  we  chose  r\j\  to  be  non-zero  modulo  p.  So  the  result  of  the  pairing  is  never  ly,  meaning  the 
inversion  algorithm  will  again  correctly  recover  x\j]  =  1.  We  have  established  that  auxiliary  input  y 
grants  invertibility,  meaning  induced  IBTDF  E[n,  g,  IDSp](y)  satisfies  the  correct  inversion  condition,  if 
/( y,  id)  mod  p  ^  0  for  all  id  €  IDSp. 

Our  IBTDF.  We  associate  to  any  integers  n, g  >  1  and  any  identity  space  IDSp  C  Zp  the  IBTDF 
scheme  induced  by  our  E-IBTDF  E[n,  g ,  IDSp]  via  auxiliary  input  y  =  (1,  0, . . .  ,  0)  €  Zpt+1,  and  denote 
this  IBTDF  scheme  by  F[n, g,  IDSp].  This  IBTDF  satisfies  the  correct  inversion  requirement  because 
/( y,  id)  =  id[ 0]  =  1^0  (mod  p)  for  all  id.  We  will  show  that  this  IBTDF  is  selective-id  secure  when 
g  =  1  and  IDSp  =  Zp,  and  adaptive-id  secure  when  IDSp  =  {0, 1}M.  In  the  first  case,  it  is  fully  lossy 
(i.e.  1-lossy)  and  in  the  second  it  is  d-lossy  for  appropriate  6.  First  we  prove  two  technical  lemmas  that 
we  will  use  in  both  cases. 


4.4  Ciphertext  pseudorandomness  lemma 

Consider  games  ReC,RaC  of  Figure  [3]  associated  to  some  choice  of  IDSp  C  Zp.  The  adversary  provides 
the  Initialize  procedure  with  an  auxiliary  input  y  £  Zp+1.  Parameters  are  generated  as  per  our  base 
IBE  scheme  with  the  addition  of  U.  The  decryption  key  for  id  is  computed  as  per  our  base  IBE  scheme 
except  that  the  games  refuse  to  provide  it  when  /( y,  id)  =  0.  The  challenge  oracle,  however,  does  not 
return  ciphertexts  of  our  IBE  scheme.  In  game  ReC,  it  returns  group  elements  that  resemble  diagonal 
entries  of  the  matrices  in  the  parameters  of  our  E-IBTDF,  and  in  game  RaC  it  returns  random  group 
elements.  Notice  that  the  challenge  oracle  does  not  take  an  identity  as  input.  (Indeed,  it  has  no  input.) 
As  usual  it  must  be  invoked  exactly  once.  The  following  lemma  says  the  games  are  indistinguishable 
under  DLIN.  The  proof  is  in  Section  14.71 

Lemma  4.1  Let  g>  1  be  an  integer  and  IDSp  C  Zp.  Let  P  be  an  adversary.  Then  there  is  an  adversary 
B  such  that 

Pr  [ ReCp  ]  -  Pr  [  RaCp  ]  <  (g  +  2)  •  Advdlin(R)  .  (7) 

The  running  time  of  B  is  that  of  P  plus  some  overhead. 
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proc  Ch()  /  ReC 

s  Z;  ;  a  £-  Zp  ;  G  <-  gs  ;  G  «-  gs  ;  S  «-  HSHS 
For  k  =  0, . . . ,  n  do  Z[k]  <-  (//^U^U^]* 

Return  (G,  G,  S,  Z) 

proc  Ch()  /  RaC 

G,  G,  S  £-  G  ;  Z  4-  G^+1 
Return  (G,  G,  S,  Z) 
proc  Finalize(d')  /  ReC,  RaC 
Return  (d'  =  1) 

Figure  3:  Games  ReC  (“Real  Ciphertexts”)  and  RaC  (“Random  Ciphertexts”)  associated  to  IDSp  C  Z 


proc  Initialize  (id) 

yo  •£-  Aux(id)  ;  yi  <—  (1, 0, . . . ,  0)  ;  Win  true 
g  £-  G*  ;  t  A  Z;  ;  g  <-  gt ;  U  4-  G* 

H,HAGn;  V,Vf-  Gnx^+P  ;  s  (Z*)n  ;  s  A-  Z£ 

For  i  =  1, . . . ,  n  do 

G[i]  <-  5SW  ;  G[i]  «-  g 8W 
For  j  =  1, . . . ,  n  do 

J [i,j]  <-  Hb?WHb']8W 

For  k  =  0, . . . ,  n  do 

If  (i  =  j  and  i  <  l)  then  W[i,  j,  fc]  f-G 
Else  W [i,j,  k ]  <-  V[j,  fc]sWv[j',  k]*Wus®y’>WA(i’j'> 
pars  <-  (5, 5,  G,  G,  J,  W,  H,  H,  V,  V,  U)  ;  msk  <—  t 
IS  <-  0  ;  id*  «—  id 

Return  pars 

Figure  4:  Games  RL/&  (0  <  l  <  n  and  b  €  {0, 1})  associated  to  n,  g,  IDSp,  Aux  for  proof  of  Lemma  14.21 


proc  GetDK(id) 

IS  ^  IS  U  {id} 

If  /(yo,  id)  =  0  then  Win  false  ;  dk  <—  _L 
Else  dk  E[ra,  /a,  IDSp]. Kg(pars,  msk,  id) 
Return  dk 

proc  Ch(id) 

id*  <r-  id 

If  /(yo,  id)  /  0  then  Win  false 
proc  Finalize(d') 

Return  ((d!  =  1)  and  (id*  0  /S')  and  Win) 


proc  Initialize(y)  /ReC,  RaC 
(pars,  msk)  £-  IBE[/z,  IDSpJ.Pg 
G?,p,//,z/,u,u)  -t—  pars 
U  A  G* 

Return  (g,  g,  H,  H,  U,  U,  U) 

proc  GetDK(id)  /ReC,  RaC 

If  /( y,  id)  =  0  then  dk  _L 

Else  dk  IBE[p,  IDSp].Kg(pars,  msk,  id) 

Return  dk 


4.5  Proof  of  Lemma  14.21 

Consider  the  games  of  Figure  []]  Game  RLj  &  makes  the  diagonal  entries  of  W  (namely  all  the  g  +  1 
entries  with  i  =  j)  random  for  i  <  l  and  otherwise  makes  them  using  y^.  Game  RLo,i  is  the  same  as 
game  RLo  and  game  RLq.o  is  the  same  as  game  RL„.  Games  RLn  o,  RL„i  are  identical:  both  make  all 
diagonal  entries  of  W  (meaning,  i  =  j)  random,  and  when  i  /  j  we  have  A(i,  j)  =  0  so  y b(k)  has  no 
impact  on  W[i,j,  k]  in  the  Else  statement.  Thus  we  have 

Pi' [RLo  ]  -  Pr[RL/]  =  (Pr[RL^]  -  Pr[RL^])  +  (Pr[RL/0]  -  Pr[RL^0])  . 

We  will  design  adversaries  Pq,  P\  so  that 

Pr[ReCp°]  -  Pr[RaCp°]  =  i  •  (Pr[RL/0]  -  Pr[RL^0])  (8) 

Pr[ReCPl]  -  Pr[RaCPl]  =  ^  •  (Pr[RL^]  -  PrfRL^])  .  (9) 

Adversary  P  picks  b  £-  {0,1}  and  runs  Pi,.  This  yields  Equation  (1101).  Now  we  present  adversary  P^ 
(b  £  {0, 1}).  It  runs  adversary  A,  responding  to  its  oracle  queries  as  follows. 
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When  A  makes  query  Initialize^),  adversary  Pi,  begins  with 

y0  A  Au  x(id)  ;  yi  (1, 0, . . . ,  0)  ;  Win  a  true  ;  IS  a  «-  0 
(. g ,  9,  H,  H,  U,  U,  17)  4-  Initialize^)  ;  (G,  G,  S ,  Z)  A  Ch(). 

Here  P5  has  called  its  own  Initialize  procedure  with  input  yi,  and  then  called  its  Ch  procedure.  Now  it 
creates  parameters  pars  for  A  as  follows: 

h,  h  A  Z£  ;  v,  v  A  ZpX(M+1)  ;  s  A  (Z*)n  ;sf-Z; 

For  i  =  1 , . . , ,  n  do 

If  (i  =  l)  then  H [i]  <-  H  ;  H[i]  <-  H  ;  G[i]  <-  G  ;  G[i]  <-  G 

If  (i  /  Z)  then  H[i]  <-  <?hM  ;  H [*]  <-  ;  G[i]  <-  ;  G[i]  <-  gf® 

For  k  =  0, . . . ,  (i  do 

If  (i  =  l )  then  V[i,  A:]  <-  U[fc]  ;  V[i,  k\  <-  U[jfe] 

If  (i  /  Z)  then  V[i,  fe]  <-  5v[*’fcl  ;  V[*,fc]  <-  cT^fc] 

For  i  =  1, . . . ,  n  do 
For  j  =  1, . . . ,  n  do 

If  (i  =  l  and  j  =  i)  then  J[i,  j]  4—  S 

If  (i  =  l  and  j  /  i)  then  J  [i,j\  -c—  Gh^G^^ 

If  (*  ^  Z)  then  J[i,j]  4-  H[/]sMH[j]§M 
For  k  =  0, . . , ,  ji  do 

If  (i  =  j  and  i  <1  —  1)  then  W[i, /,  fc]  4-G 
If  (i  =  j  and  i  =  l)  then  W[i,  j,  k]  Z[fc] 

Else  W[t,j,fc]  <-  V[;,A:]8Wv[j,A:]8WZ78Wyk[fc]A(i»J') 
pars  <—  (3, 5,  G,  G,  J,  W,  H,  H,  V,  V,  Z7) 

It  returns  pars  to  A. 

When  adversary  A  makes  query  GetDK(id),  adversary  P5  proceeds  as  follows.  In  this  code,  GetDK  is 
P/s  own  oracle: 

IS A  <-  75a  U  {id} 

If  /(yo,  id)  =  0  then  Win  a  •$—  false  ;  dk  <—  A. 

Else 

(Pi,  £>2,  P3 ;  P>4)  A  GetDK(id) 
r'  A  (Zpn  ;  f'  4-  Z£ 

For  i  =  1, . . . ,  n  do 

If  i  =  /  then  (Di[i],  D2[i],  D3[i],  D4[i])  <-  (Du  D2,  D3,  D±) 

Else 

Di[i]  <-  P(V[i,  •],  id)r'W H[i]f,M  :  D2 [i]  <- 
D3[i]^<?-r'[i];  D4[i]<-«rf'M 
dfc  •<—  (Di, D2, D3, D4) 

It  returns  dk  to  A.  Notice  that  P/s  invocation  of  GetDK  will  never  return  _L.  In  the  case  6=1  this  is 
true  because  /(yi,  •)  =  1  /  0.  In  the  case  b  =  0  it  is  true  because  the  case  /(yo,  id)  =  0  was  excluded  by 
the  If  statement.  To  justify  the  above  simulation,  define  r,r  by  r[i]  =  r'[i]/t  and  r[i]  =  f'[i]/f  for  i  /  l 
and  r[Z],?[Z]  as  the  randomness  underlying  (D\,  P2,  P3,  P4).  Then  think  of  r,  r  as  the  randomness  used 
by  the  real  key  generation  algorithm.  Here  t  is  the  secret  key,  so  that  g  =  gt  ■ 

When  adversary  A  makes  query  Ch(id),  adversary  P5  proceeds  as  follows: 
id*  <-  id 

If  /(yo,  id)  /  0  then  Wina  false. 
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proc  Initialize(id)  /  RLo 

y0  £  Au x(id)  ;  yi  4-  (1, 0, . . . ,  0) 

( pars,  msk )  -4-  E[n, /r,  IDSp].Pg(yi) 
IS  4—  0  ;  id*  4—  id  ;  Win  4—  true 
Return  pars 

proc  Initialize^)  //  RLn 

y0  A  Au x(id)  ;  yi  4-  (1, 0, . . . ,  0) 
(pars,  msk)  A  E[n,  p,  !DSp].Pg(y0) 
IS  4 —  0  ;  id*  4—  id  ;  Win  4—  true 
Return  pars 


proc  GetDK(id)  //  RLo,RLn 
IS  ^  IS  U  {id} 

If  /( yo,  id)  =  0  then  Win  4—  false  ;  dk  4—  _L 
Else  dk  4—  E[n,  //,  I DSp] . Kg(pars,  msk,  id) 
Return  dk 

proc  Ch(id)  /RLojRL^ 
id*  <-  id 

If  /(yo,  id)  7^  0  then  Win  4—  false 
proc  Finalize(rf)  //  RLo,RLn 
Return  ((d!  =  1)  and  (id*  0  IS)  and  Win) 


Figure  5:  Games  RLo,RLn  ( “Real-to-Losssy” )  associated  to  n,p,  IDSp  C  Z p  and  auxiliary  input  generator 
algorithm  Aux. 


Finally,  A  halts  with  output  d' .  Adversaries  Pq,  P\  compute  their  output  differently.  Adversary  Pi  returns 
1  if 


(d!  =  1)  and  id*  0  IS  a  and  Win^ 


and  0  otherwise.  Adversary  Po  does  the  opposite,  returning  0  if  the  above  condition  is  true  and  1 
otherwise.  We  obtain  Equations  (|8j).  (f9j)  as  follows: 


Pr[ReCPl]  -  Pr[RaCPl] 


Pr[ReCp°]  -  Pr[RaCp°] 


1  "• 

-^Pr[RL^1;1]-Pr[RL^] 

n  l=i 

Pr[RL^]  -  Pr[RL^] 

i  n 

~  E^1  -  pr[RLEi,o])  -  (1  -  Pr[RL£0]) 

n  l=i 

i  n 

-EPr[RL^o]-Pr[RLf_li0] 

n  i=i 

Pr[RL^0]  -  Pr[RL^0]  . 


4.6  Real-to-lossy  lemma 

Consider  games  RLo,RLn  of  Figure  [5]  associated  to  some  choice  of  n,  fa,  IDSp  C  Zp  and  auxiliary  input 
generator  Aux  for  E[n,  /jl,  IDSp].  The  latter  is  an  algorithm  that  takes  input  an  identity  in  IDSp  and  returns 
an  auxiliary  input  in  Zp+1.  Game  RLo  obtains  an  auxiliary  input  yo  via  Aux  but  generates  parameters 
exactly  as  E[n,  n,  IDSp] . Pg  with  the  real  auxiliary  input  yi.  The  game  will  return  true  under  the  same 
condition  as  game  Real  but  additionally  requiring  that  /(yo,  id)  /  0  for  all  GetDK(id)  queries  and 
/(yo,  id)  =  0  for  the  Ch(zd)  query.  Game  RL„  generates  parameters  with  the  auxiliary  input  provided 
by  Aux  but  is  otherwise  identical  to  game  RLo-  The  following  lemma  says  it  is  hard  to  distinguish  these 
games.  We  will  apply  this  by  defining  Aux  in  such  a  way  that  its  output  yo  results  in  a  lossy  setup.  The 
proof  of  the  following  is  in  Section  14.51 

Lemma  4.2  Let  n,p  >  1  be  integers  and  IDSp  C  Zp.  Let  A  ux  be  an  auxiliary  input  generator  for 
E [n,n,  IDSp]  and  A  an  adversary.  Then  there  is  an  adversary  P  such  that 

Pr[RLo  ]  -  Pr[RL^]  <  2n  •  (Pr  [ReCp]  -  Pr  [RaCp])  .  (10) 
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proc  Initialize(y)  /  PC,  PC; 

( pars,msk )  A  IBE[/x,  IDSpJ.Pg 
(g,9,H,H,  U,U)  4—  pars 
[/f-G* 

Return  (g.  g ,  iL,  LA,  U,  U,  C/) 
proc  GetDK(iirf)  //PC,  PC ; 

If  /(y,  id)  =  0  then  dk  4—  _L 

Else  dA;  -t—  I B E[/u,  IDSp].Kg(pars,  msA;,  id) 

Return  dA; 


proc  Ch()  //  PC 

s  A  Z;  ;  I  Zp  ;  G  «-  5s  ;  G  «-  5s  ;  S  4-  HSHS 
For  k  =  0, . . . ,  p  do  Z [k]  4-  (C/y[fclU[As])sU[A:]* 
Return  (G,  G,  S,  Z) 

proc  Ch()  H  PC; 

8  A  z;  ;  8  4-  Zp  ;  G  4-  gs  ;  G  4-  gs  ;  5  4-  G 
For  k  =  0, . . . ,  l  —  1  do  Z[k]  A  G 
For  k  =  l, . . . ,  p  do  Z[k]  4-  (A7yMu[fc])sU[A;]s' 
Return  (G,  G,  S,  Z) 

proc  Finalize^)  /  PC,  PC; 

Return  (dr  =  1) 


Figure  6:  Games  PC,  PC;  (0  <  l  <  p  +  1)  associated  to  IDSp  C  Z p+1  for  the  proof  of  Lemma  Phil 


The  running  time  of  P  is  that  of  A  plus  some  overhead.  If  A  is  selective-id  then  so  is  P. 

The  last  statement  allows  us  to  use  the  lemma  in  both  the  selective-id  and  adaptive-id  cases. 

4.7  Proof  of  Lemma  14.11 

Consider  the  games  of  Figure  [6j  Game  PC  is  the  same  as  game  ReC.  Game  PC;  (0  <  l  <  p  +  1)  makes 
S  random  and  also  makes  the  first  l  —  1  entries  of  Z  random  and  the  rest  real.  Thus  PC^+i  is  the  same 
as  RaC.  We  will  design  adversaries  B\ .  R2  so  that 

Advdlin(Ri)  =  Pr[PCp]  -  Pr[PC£]  (11) 

Advdlin(R2)  =  -L-  (Pr[PCp]  -  Pr[PCp+1])  (12) 

fj,  ± 

Adversary  B  will  run  B\  with  probability  l/(p  +  2)  and  R2  with  probability  (p  +  1) / (/z  +  2).  This  yields 
Equation  ©. 

On  input  (g.  g ,  gs,gs,  H,  T)  where  T  is  either  Hs+s  or  random,  adversary  B\  runs  adversary  P,  responding 
to  its  oracle  queries  as  follows.  When  P  makes  query  Initialize(y),  adversary  B\  lets 

u,  u  /—  Z£+1  ;  u,v  A  Zp  ;  H  4-  Hgv  ;  U  4-  gu 
For  k  =  0, . . . ,  p  do  U[fc]  4-  U~yWguW  ■  U[fc]  4-  ga[k  1 

It  returns  (g,  g,  H,  H,  U,  U,  U)  to  P.  When  P  makes  its  (single)  Ch()  query,  adversary  B\  lets 

S  4-  TgvS 

For  k  =  0, . . . ,  p  do  Z[k\  4- 

It  returns  (gs,gs,  S,  Z)  to  P.  Notice  that  for  0  <  k  <  p 

Z[k]  =  gsuWgSiiW  =  (Jjylk]-y[k]gu[k}ytj$m  =  (py Mu[A;])sU[Ac]s  . 

Also  if  T  =  Hs+s  then  S  =  Tgvs  =  Hs(Hgv)s  =  HSHS  as  in  PC  while  if  T  is  random,  so  is  S,  as  in  PCo- 
When  P  makes  query  GetDK(id),  adversary  B\  does  the  following: 

If  /( y,  id)  =  0  then  dk  4—  T 
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Else 

r',r'  A  Zp 

£)l  <!_  g-f{yM)ur' gf{u,id)r' ]j-f(u,id)r'/f(y,id)  ■  ]J2  gf(u,id)r'ff-f(u,id)f'/f(y,id)jjur' 
Ds  Hr'lf{yM)g-r‘  .  Di  ^  >g-ur’  .  dA;  ^  (Dl,  D2,  D3,  DA) 


It  returns  dk  to  P.  We  now  show  this  key  is  properly  distributed.  Let  h  be  such  that  H  =  gh  and  let 


hr' 


mod  p  and  r  =  ur'  mod  p  . 


t  tf(y, id) 

Since  t,  f(y,uid)  are  non-zero  modulo  p  and  r',r'  are  random,  r,  f  are  random  as  well.  The  following 
computes  the  correct  secret  key  components  with  the  above  randomness  and  shows  that  they  are  the  ones 
of  the  simulation: 


H 


n(V,id)trHtf  =  U[0]tr(nLi ^[k]id[k]tr) 

_  JJ-yl°]tr gu[0]tr  ^JJ-y[k]id[k]tr g\i[k\id[k]t 

_  JJ-f(ydd)trf(u,id)trjjtr 

—  JJ-f{yM){r'-hf'/f(y,id))gf{^M){r'-hf'/f{y,id))jjtur' 
—  fi-hur'  -f(y,id)ur'  f(u,id)r'  -f(u,id)hr’/ f(y, id) ghtur' 
=  g-f{y,id)ur'gf(u,id)r'ff- f (u,id)f' / f (y ,id)  _  jg ^ 


H{t],id)rHf'  =  U[0]r  (rifc=iUMid[fc] 


— tr 


—tr 


fjf  _  ~u[0]r  ^u[k]id[k]r^j  jjr 

~f(u,id)r fjr  _  gf(u,id)trjjr~ 

f(u,id){r' -hr' / f{y,id))  pjuf  _  f(u,id)r'jj--f(u,id)f'/f(y,id)j^ur'  _ 
ghr'/f(y,id)-r'  _  jjr'/f{y,id)g-r'  _  ^ 


=  9 


—tur' 


=  9 


=  d4  . 


Finally  adversary  P  outputs  d' .  Adversary  B\  also  outputs  d' ,  so  we  have  Equation  (fTTlh 

On  input  ( g ,  g,  gs,gs,  U ■  T)  where  T  is  either  Us+s  or  random,  adversary  B2  runs  adversary  P,  responding 
to  its  oracle  queries  as  follows.  When  P  makes  query  Initialize(y),  adversary  B\  lets 


l  A{0,...,/4;  u,  u  A  Zp+1  ;  u,h,h  A  Zp  ;  H  <-  gh  ;  H  <-  gh  ;  U  f-  gb 
For  k  =  0, . . . ,//  do  U [k\  <-  U^l,k)gu[k]  .  u[jfe]  <-  UA(kk)^u[k] 


It  returns  (g,  g,  H,  H,  U,  U,  U)  to  P.  When  P  makes  its  (single)  Ch()  query,  adversary  B2  lets 
S  Ag 

For  k  =  0, . . . ,  l  -  1  do  Z[k\  G 

For  k  =  l, . . .  ,n  do  Z [k]  <-  (0*)«yW+uW^)fl[%A(y=) 


It  returns  (gs,gs,  S,  Z)  to  P.  Notice  that  for  l  +  1  <  k  <  p 

Z [k\  =  =  f7sy^U[/fc]sU[£f  =  ([/yWu[fc])sU[fc]®  . 

If  T  =  Us+S  then 

Z  [l]  =  (^)«yW+uH(^)u[i]r  = 

as  in  game  PC/.  On  the  other  hand  if  T  is  random  then  so  is  Z [Z] ,  as  in  game  PC/+i.  When  P  makes 
query  GetDK(id),  adversary  B2  does  the  following: 
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If  /( y,  id)  =  0  then  dk  <—  _L 
Else 


.$  rn 

r,  r  <—  Lp 

JJ1  l jf(u,id)r~hr '  gf(u,id)r£jid{L]rghr'jj-hid{l]r/h 

A  <-  g~r ;  A  <-  ;  <_  (d1,d2,d3,d4) 


It  returns  to  P.  We  now  show  this  key  is  properly  distributed.  Let  u  be  such  that  U  =  gu  and  let 

r1  id[l]ur 

r  = - : —  mod  p  . 

t  th  F 

Since  t  is  non-zero  modulo  p  and  r’  is  random,  f  is  random  as  well.  The  following  computes  the  correct 
secret  key  components  with  the  above  randomness  and  shows  that  they  are  the  ones  of  the  simulation: 


U(\J,id)trHtf 


H{ti,  id)rHf 


g 

g 


—tr 


—tr 


H 


xhtr 


u[(T  (nLiU^P^ 

u[0 ]tr  ( Tit1  fTid[k]trA(l,k)  u[k]id[k]tr\ 

9  ^Ilfc=i^  9  J  9 

gf(u,id)tr^jid[l]tr^htr  _  gf(u,id)tr  jjid[l]tr  ~h(r' — id[l\ur/h) 

£jf(\i,id)r  £j-id[l]tr  ~ hr'  ~  — id[l]ur  ~f(u,id)rgid[l\urt~hr'~  —  id[l]ur 


gPu.id)r~hr'  =  Di 


XJ[0]r  Hf  =  (jJ^_^JJid[k}rA(l’k)gU[k]id[k]r^  ~hr 


gf(u,id)r^jid[l\r  thf  _  gf(u,id)r^jid[l\r  h(f'-id[l\ur/h) 

gf(u,id)r^jid[l]rghf'g—hid[l]ur/h  _  gf(u,id)r -Q-id[l]r ghf  jj—hid[l]r /h  _  jj 

g~r  =  A 

gUrid[l\/h—r’  _  jjid[l\r/h  -f1  _  jj ^  _ 


Finally  adversary  P  outputs  d! .  Adversary  B2  also  outputs  d' .  So 

1  11 


Advdlin(A)  =  — -£Pr[PCf]-Pr[PCf+1] 

(A  I  -L 


1=0 


g  + 


—  Pr[PCg  ]  —  Pr[PC 


/i+1  J 


and  we  have  Equation  (1121). 

4.8  Selective-id  security 

We  consider  IBTDF  F[n,  l,Zp],  the  instance  of  our  construction  with  p  =  1  and  IDSp  =  Zp.  We  show 
that  this  IBTDF  is  selective-id  (5-lossy  for  5  =  1,  meaning  fully  selective-id  lossy,  and  hence  selective- id 
one-way.  To  do  this  we  define  a  sibling  LF[n,  1,ZP].  It  preserves  the  key-generation,  evaluation  and 
inversion  algorithms  of  F[n,  l,Zp]  and  alters  parameter  generation  to 

Algorithm  LF[n,  1, Zp].Pg(id) 

y  «—  (—id,  1)  ;  (pars,  msk)  A  E [n,  l,Zp].Pg(y)  ;  Return  (pars,  msk ) 

The  following  says  that  our  IBTDF  is  1-lossy  under  the  DLIN  assumption  with  lossiness  i  =  n  —  21g(p). 
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Theorem  4.3  Let  n  >  21g (p)  and  let  £  =  n  —  21g (p).  Let  F  =  F [n,  1,ZP]  6e  i/ie  IBTDF  associated  by 
our  construction  to  parameters  n,  p  =  1  and  IDSp  =  Zp.  Let  LF  =  LF[n,  l,Zp]  be  the  sibling  associated 
to  it  as  above.  Let  <5=1  and  let  be  A  a  selective-id  adversary.  Then  there  is  an  adversary  B  such  that 

AdvF~L°M(^)  <  2n(/x  +  2)  •  Advdlin(i?)  .  (13) 

The  running  time  of  B  is  that  of  A  plus  overhead. 

Proof  of  Theorem  14.31  On  input  id,  let  algorithm  Aux  return  (—id,  1).  Let  RLq ,  RL„  be  the  games 
of  Figure [5] with  p  =  1,  IDSp  =  Zp  and  this  Aux.  Then  we  claim 

Pr  [Realp]  =Pr[RL^]  and  Pr  [  Lossy  j^LF)£]  =Pr[RL^]  .  (14) 

To  justify  this  let  id*  be  the  identity  queried  by  A  to  both  Initialize  and  Ch.  (These  queries  are  the 
same  because  A  is  selective-id.)  Then  yo  =  (—id*,  1)  so  f(yo,id)  =  id  —  id*.  This  is  0  iff  id  =  id*. 
This  means  that  the  conjunct  (id*  0  IS)  A  Win  is  always  true.  The  claim  of  Equation  (fT4l)  is  now  true 
because  game  RLo  generates  parameters  with  the  real  auxiliary  input  yi  =  (1, 0)  €  Z2  that,  via  E[n,  1,  Zp], 
defines  F.  However  game  RLn  generates  parameters  with  auxiliary  input  yo-  Since  / (yo,  id*)  =  0,  the 
dependency  of  C;$[j]  on  x[j]  in  Equation  (J5J)  vanishes  when  id  =  id*.  Examing  equations  ([3|),  (j4j).  (j5j) , 
we  now  see  that  with  pars  fixed,  the  values  (s,  x),(s,x)  determine  the  ciphertext  (C\,  C2,  C3,  C4). 
Thus  there  are  at  most  p2  possible  ciphertexts  when  id  =  id*,  and  2n  possible  inputs.  This  means  that 
A(F.Ev(pars,  id*,  ■))  >  n  —  lg (p2)  =  £,  which  justifies  the  second  claim  of  Equation  (fl4l).  Recalling  that 
<5  =  1,  Equation  (fT3l)  follows  from  Equation  dU),  Equation  (1141) .  Lemma  14.21  and  Lemma  14.11  1 

4.9  Adaptive-id  Security 

We  consider  IBTDF  F[n,  p,  {0, 1}^],  the  instance  of  our  construction  with  IDSp  =  {0, 1}M  C  Zp.  We  show 
that  this  IBTDF  is  adaptive-id  <5-lossy  for  5  =  (4(^z  +  ljQ)”1  where  Q  is  the  number  of  key-derivation 
queries  of  the  adversary.  By  Theorem  13.21  this  means  F [n,  p,  {0, 1}^]  is  adaptive-id  one-way.  To  do  this 
we  define  a  sibling  LF q[h,  p,  {0, 1}^].  It  preserves  the  key-generation,  evaluation  and  inversion  algorithms 
of  F[n,  p,  {0, 1}M]  and  alters  parameter  generation  to  LF[n,  p,  {0,  l}M].Pg(id)  defined  via 

y  •<— Aux  ;  (pars,msk)  A-  E[n,  p,  {0,  l}M].Pg(y)  ;  Return  ( pars,msk )  . 

where  algorithm  Aux  is  defined  via 

y^O]  A-  {0, ... ,  2Q  —  1}  ;  K  A-  {0, . . . ,  /r  +  1}  ;  y[0]  <—  y'[0]  —  2£Q 
For  i  =  1  to  p  do  y[i]  A  {0, ... ,  2 Q  —  1} 

Return  y  £  Zp+I 

The  following  says  that  our  IBTDF  is  5-lossy  under  the  DLIN  assumption  with  lossiness  £  =  n  —  2  lg(p). 

Theorem  4.4  Let  n  >  2lg(p)  and  let  £  =  n  —  2\g(p) .  Let  F  =  F[n,  p,  {0, 1}M]  be  the  IBTDF  associated  by 
our  construction  to  parameters  n,  p  and  IDSp  =  {0, 1}M.  Let  A  be  an  adaptive-id  adversary  that  makes 
a  maximal  number  of  Q  <  p / (3m)  queries  and  let  5  =  (&(p  +  l)Q)~l .  Let  LF  =  LFq[7i,  p,  {0, 1}M]  be  the 
sibling  associated  to  F,  A  as  above.  Then  there  is  an  adversary  B  such  that 

Adv£;[°Fs£(A)  <  2 n(p  +  2)  ■  Advdlin(R)  .  (15) 

The  running  time  of  B  is  that  of  A  plus  0(p2  p~1((pQp)~1))  overhead,  where  p  =  \-  Adv^'p  e  (A). 

Proof  of  Theorem  14.41  Our  proof  uses  a  simulation  technique  due  to  Waters  fB2j.  We  used  a  slightly 
improved  analysis  from  [42],  Let  Q  be  the  number  of  queries  made  by  A  and  let  algorithm  Aux  be  defined 
as  above.  Let  RLo,  RLn  be  the  games  of  FigureOwith  IDSp  =  {0, 1}^  and  this  Aux.  Let  E (IS,  id*)  denote 
the  event  that  when  procFinalize(cf )  is  called  in  RL(^  the  flag  Win  false  is  set  and  id*  0  IS.  (Note 
that  i](IS,  id*)  only  depends  on  IS,  id*  since  yo  is  exclusively  used  to  set  Win  4—  false.)  Let  g(IS,  id*)  be 
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the  probability  that  E (IS,  id*)  happens.  In  [321  Lemma  6.2],  it  was  shown  (using  purely  combinatorial 
arguments)  that  Aiow  :=  4 j7[^TjQ  —  * )  <  9^  :=  Aup.  Since  RL^  and  Realp  are  only  different 

when  E(IS,id*)  happens,  one  would  like  to  argue  that  Aiow  •  Pr  [Realp  ]  =  Pr  [RL^]  but  this  is  not 
true  since  E (IS,  id*)  and  Realp  may  not  be  independent.  To  get  rid  of  this  unwanted  dependence  we 
consider  a  modification  of  RLo  and  RLn  which  adds  some  artificial  abort  such  that  in  total  it  always  sets 
Win  -t—  false  with  probability  around  1  —  Aiow,  independent  of  the  view  of  the  adversary.  (Since,  given 
IS,  id*,  the  exact  value  of  r](IS,id*)  cannot  be  computed  efficiently,  it  needs  to  be  approximated  using 
sampling.)  Concretely,  games  RLo  and  RL„  are  defined  as  RLo  and  RLn,  respectively,  the  only  difference 
being  Finalize  which  is  defined  as  follows. 

proc  Finalize(tf)  /RLo,RLn 

Compute  an  approximation  rf(IS,  id*)  of  p(IS,  id*) 

If  p'  (IS,  id*)  >  Aiow  then  set  Win  <—  false  with  probability  1  —  X\ow/p'(IS,  id*) 

Return  (( d ’  =  1)  and  (id*  0  IS)  and  Win) 


We  refer  to  [32]  on  details  how  to  compute  the  approximation  if  (IS,  id*).  Using  [321  Lemma  6.3],  one 
can  show  that  if  we  use  0(pr  p~l((pQp)~1))  samples  to  compute  approximation  rf(IS,  id*),  then 


Pr  [  Realp  ]  —  A]^  •  Pr 
Setting  p  =  \  ■  Pr  [  Realp  ]  we  obtain 

5  ■  Pr  [  Realj^  ]  =  Pr 


RL0 


RLn 


=  P- 


where  <5  =  Aiow/2  is  as  in  the  theorem  statement.  As  in  the  proof  of  Theorem  14.31  we  can  show  that 


Pr  [  Lossy p  lf^  ]  =  Pr 


rl; 


(16) 

(17) 

3 

(18) 


Now  Equation  (11511  follows  from  Equations  (JT]) ,  (I17j).  (|18j).  Lemma  m  and  (a  version  incorporating  the 
artificial  abort  of)  Lemma  14.11  1 


We  remark  that  we  could  use  the  proof  technique  of  |T2]  which  avoids  the  artificial  abort  but  this  increases 
the  value  of  <5,  making  it  dependent  on  the  adversary  advantage.  The  proof  technique  of  m  could  be 
used  to  strengthen  6  in  Theorem  14.41  to  0(y/mQ)~1  which  is  close  to  the  optimal  value  Q_1. 


5  IB-TDFs  from  Lattices 

Here  we  give  a  construction  of  a  lossy  IB-TDF  from  lattices,  specifically,  the  LWE  assumption.  We  note 
that  a  one-way  IB-TDF  can  already  be  derived  by  applying  methods  from  [29,  2]  to  the  LWE-based 
injective  (not  identity-based)  trapdoor  function  from  [35] . 

LWE  is  a  particular  type  of  average-case  BDD/GapSVP  problem.  It  has  been  recognized  since  [50] 
that  GapSVP  (and  BDD  [35])  induces  a  form  of  lossiness.  So  there  is  folklore  that  the  GPV  LWE-based 
TDF  can  be  made  to  satisfy  some  meaningful  notion  of  lossiness  (specifically,  for  an  appropriate  input 
distribution,  the  output  does  not  reveal  the  entire  input  statistically)  by  replacing  its  normally  uniformly 
random  key  with  an  LWE  (BDD/GapSVP)  instance.  However,  a  full  construction  and  proof  according 
to  the  standard  notion  of  lossiness  (which  compares  the  domain  and  images  sizes  of  the  function)  have 
not  yet  appeared  in  the  literature,  and  there  are  many  quantitative  issues  to  address. 

In  this  section  we  construct  an  (ID-based)  TDF  that  is  lossy  for  a  natural  (uniform)  input  distribution. 
We  favor  simplicity  of  analysis  at  the  expense  of  tight  bounds,  so  our  construction  is  highly  unoptimized 
and  should  be  seen  mainly  as  a  proof  of  feasibility.  Much  tighter  constructions  and  bounds  can  be 
achieved  using  more  sophisticated  machinery  from  the  literature. 
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5.1  Background 


For  a  real  matrix  X,  we  let  si(X)  denote  its  largest  singular  value  (also  known  as  spectral  norm),  i.e., 
si(X)  =  maxy^o  ||Xy||/||y||.  It  is  easy  to  verify  that  the  spectral  norm  satisfies  the  triangle  inequality 
si(X  +  Y)  <  si(X)  +  si(Y)  and  si(XY)  <  si(X)si(Y).  Throughout  this  section  we  let  n  be  the 
main  security  parameter,  and  let  u{y/\ogn)  denote  a  fixed  function  that  grows  asymptotically  faster  than 
Vlog  n. 

Probability  distributions.  The  discrete  Gaussian  distribution  with  parameter  s  >  0  over  the  integers 
Z,  written  Dz,s,  assigns  probability  proportional  to  exp(— irx2/s2)  to  each  x  E  Z  (and  probability  zero 
elsewhere).  It  is  extended  to  a  product  distribution  over  Zn  in  the  natural  way,  i.e.,  =  Dlfs. 

We  say  that  a  random  variable  X  over  M  is  subgaussian  with  parameter  s  if  for  all  t  >  0,  we 
have  Pr[|A'|  >  t]  <  2exp(— 7r t2/s2).  More  generally,  we  say  that  a  random  vector  x  (respectively,  a 
random  matrix  X)  or  its  distribution  is  subgaussian  of  parameter  s  if  all  its  one-dimensional  marginals 
(x,  u)  (respectively,  u*Xv)  for  unit  vectors  u,  v  are  subgaussian  of  parameter  s.  The  concatenation  of  n 
independent  subgaussian  variables  with  common  parameter  s,  interpreted  as  either  a  vector  or  matrix,  is 
also  subgaussian  with  parameter  s.  It  is  also  known  that  D^s  is  subgaussian  with  parameter  s  (see  [451 
Lemma  2.8]).  We  need  the  following  standard  fact  from  random  matrix  theory  (see,  e.g.,  [60]). 

Lemma  5.1  For  a  random  matrix  X  E  M.hxw  that  is  subgaussian  with  parameter  s,  we  have  si(X)  = 
s  ■  0(Vh  +  y/ui)  except  with  probability  2~n(h+w'1 . 

Lattices  and  LWE.  Throughout  the  remainder  of  this  section  we  let  q  =  q(n)  denote  a  prime,  and  Zg 
denote  the  ring  of  integers  modulo  q.  It  is  possible  to  generalize  our  constructions  to  moduli  of  other  forms 
(e.g.,  prime  powers)  using  known  facts  from  the  literature  (see,  e.g.,  [36]),  but  this  somewhat  complicates 
the  constructions  and  the  statements  of  the  bounds  we  use,  so  we  stick  with  prime  moduli  for  simplicity. 

As  in  many  recent  papers,  we  work  with  a  family  of  “q- ary”  lattices  (and  their  cosets),  represented  by 
parity-check  matrices  A  E  Z™xm.  The  precise  definition  of  these  lattices  will  not  be  needed  in  this  work, 
so  we  omit  it  and  refer  the  interested  reader  to,  e.g.,  |36j  for  details.  The  following  lemma  is  special  case 
of  [361  Lemma  5.3]  and  [35]  Lemma  2.4],  and  the  properties  of  the  “smoothing  parameter”  (see  [3T1.  55] ). 

Lemma  5.2  For  prime  q  and  integer  b  >  2,  let  fh  >  n\ogbq  +  w(logn).  With  overwhelming  proba¬ 
bility  over  the  uniformly  random  choice  of  A  E  Z”xm,  the  following  holds:  for  r  <—  D™b  ,  the 

distribution  of  Ar  E  TUf  is  negl(n) -far  from  uniform. 

Note  that  by  the  triangle  inequality  for  statistical  distance,  the  above  statement  also  holds  where  r  is 
replaced  by  R  «—  _ -),  and  Ar  €  Z™  with  AR  €  Z”xu\  for  any  w  =  poly(ra). 

The  (decisional)  learning  with  errors  (LWE)  problem  [53]  in  dimension  n  with  error  rate  a  E  (0,1), 
stated  in  matrix  form,  is:  given  an  input  (A,  b)  E  Z™xm  x  Z™  (for  any  m  =  poly(n))  where  A  is  uniformly 
random,  and  b  is  either  of  the  form  b*  =  x*  [*£*]  mod  q  for  x  <—  F)ff+™ ,  or  is  uniformly  random  and 
independent  of  A,  distinguish  which  is  the  case  with  non-negligible  advantage^]  By  a  routine  hybrid 
argument,  replacing  x  with  a  matrix  X  having  any  number  w  =  poly(n)  of  independent  columns  (each 
drawn  from  -D^a"”),  and  replacing  b*  with  either  Bf  =  ~Kt  [J^]  mod  q  or  a  uniformly  random  B  of  the 
same  dimension,  yields  an  equivalent  problem  (up  to  a  w  factor  in  the  adversary’s  advantage).  When 
aq  >  2 y/n,  this  decision  problem  is  at  least  as  hard  as  approximating  several  problems  on  n-dimensional 
lattices  in  the  worst  case  to  within  0(n/a )  factors  with  a  quantum  algorithm  [53],  or  via  a  classical 
algorithm  for  a  subset  of  these  problems  [551. 

1This  is  actually  the  “normal  form”  of  the  LWE  problem,  which  is  equivalent  to  the  one  from  [53]  in  which  the  portion 
of  x  that  is  multiplied  by  A4  is  uniformly  random  in  Z";  see,  e.g.,  [5].  In  addition,  for  simplicity  of  analysis  we  use  a  true 
discrete  Gaussian  error  distribution  Dz,aq  instead  of  a  “rounded”  continuous  Gaussian  as  in  [5J;  hardness  for  this  error 
distribution  is  implied  by  the  results  of  m- 
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Trapdoors  for  lattices.  We  recall  the  notion  and  efficient  construction  of  a  (strong)  trapdoor  for  q- ary 
lattices,  due  recently  to  Micciancio  and  Peikert  |46| .  This  construction  uses  a  public  “gadget”  matrix  G 
over  Zq,  defined  as 

G  =  (19) 

for  some  integer  base  b  >  2  and  w  =  [log^f?].  (Note  that  [M]  mainly  focuses  on  the  case  6  =  2;  in  our 
constructions  we  will  need  to  take  6  to  be  larger,  but  still  constant.) 

Following  [56],  we  say  that  an  integer  matrix  R  £  yJ(m-nw)xnw  is  a  trapdoor  with  tag  H  £  Zqxn  for 
A  £  Z”xm  if  A  [^]  =  H  •  G.  In  our  constructions,  H  will  always  be  either  an  invertible  matrix,  or  the 
zero  matrix.  The  trapdoor  generation  algorithm  of  |35|  works  for  any  m  >  n(logbq  +  w)  +  w(logn)  and 
generates  a  nearly  uniform  A  £  Zqxm,  together  with  a  trapdoor  R  (with  a  desired  tag  H)  for  A.  Letting 
m  =  m  —  nw  >  n  logb  q  +  co  (log  n),  it  chooses  A  £  Z™xm  uniformly  at  random,  chooses  R  <— 
and  lets  A  =  [A  |  H  •  G  —  AR],  It  is  clear  by  inspection  that  R  is  a  trapdoor  for  A,  and  by  Lemma  15.21 
the  distribution  of  A  is  negl(n)- far  from  uniform. 

We  recall  two  of  the  main  operations  enabled  by  a  trapdoor:  inversion  of  the  (injective)  LWE  function 
qa  (x)  :=  x4  [J^J  mod  q  for  “short”  integer  vectors  x,  and  delegation  of  a  trapdoor  for  an  extended  parity- 
check  matrix. 

Lemma  5.3  ( [46j  1  Let  H  be  a  trapdoor  with  any  invertible  tag  H  £  Zqxn  for  A  £  Z”xm;  using  a  gadget 
matrix  G  with  base  b  >  2.  There  are  efficient  algorithms  Invert  and  DelTrap  that  do  the  following: 

1.  For  b4  =  gA (x)  :=  x4  [^  ]  mod  q  where  x  £  Zm+ri  is  such  that  ||x||  <  q/Q(b  ■  si(R)),  the  algorithm 
lnvert(R,  A,  b)  outputs  x. 

2.  For  any  invertible  tag  H',  matrix  A'  £  Zqxnw ,  and  any  sufficiently  large  s  =  fl(6-si(R))-cv(-v/k)gn), 
the  algorithm  DelTrap(R,  [A  |  A'],H',s)  outputs  a  trapdoor  R'  with  tag  H'  for  [A  |  A'],  where  R' 
has  the  same  distribution  (up  to  negl(n)  statistical  distance)  for  any  trapdoor  R  satisfying  the  above 
bound  on  si(R),  and  si(R')  =  0(y/m)  with  overwhelming  probability. 


5.2  Our  basic  trapdoor  function 

Let  c  >  1  and  integer  base  6  >  2  be  constants  to  be  determined  later  in  the  analysis,  and  let  h  =  cn, 
m  >n  logfe  q  =  cn  logb  q  be  integers.  Define  i/3  =  {0, 1, ...  ,  /5 —  1}  and  I7  similarly  for  some  positive  integers 
/3  >  7  to  be  determined  later.  (The  analysis  also  goes  through  unchanged  for  Ig  =  [— /?, ...,/?  —  1)  and 
L.  defined  similarly.) 

1.  Parameters:  The  public  parameter  pars  is  a  matrix  A  £  Zqxm  (which  will  be  close  to  uniform,  either 
statistically  or  computationally),  and  the  trapdoor  msk  is  a  trapdoor  R  (for  any  invertible  tag  H)  for  A 
with  bounded  si(R).  For  a  sufficiently  large  m  =  D(nlogfe  q),  these  can  be  created  using  the  trapdoor 
generation  algorithm  described  above,  or  via  the  DelTrap  algorithm  from  Item[2]of  Lemma  15. 31 

2.  Evaluate:  Given  parameter  A  and  input  x  £  /™+n  x  F)~n,  algorithm  LWE.Ev  outputs 


64 


firA(x)  :=  x4 


mod  q. 


3.  Invert:  Given  parameter  A,  trapdoor  R  and  output  b,  algorithm  LWE.Ev  1  returns  x  using  the 
inversion  algorithm  from  Item  |T]  of  Lemma  15.31 

The  next  lemma  shows  that  when  A  has  a  particular  non-uniform  structure  ( without  a  trapdoor  R), 
the  function  gA  is  lossy  when  the  parameters  are  set  appropriately;  we  show  how  to  do  so  after  the  proof. 

Lemma  5.4  Suppose  that  A  £  Zqxm  is  such  that 


Ira 

I  m+n 

Ira 

A 

E* 

A 
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for  some  A  G  Zffxm  and  E4  G  n)x(m+n).  Then  for  x  G  /™+n  x  Iff  n,  the  number  of  distinct  output 
values  <?a(x)  is  at  most  0{(3  +  7  •  si(E))m+n. 

In  particular,  for  large  enough  7C_1  >  2 ^(m/n)  and  (3  >  7  •  si(E),  i/ie  function  g\  is  fl(m) -lossy. 


Proof:  Notice  that 


5a(x)  =  x* 


I'm 

(t 

Im+n 

\ 

Im 

A 

=  (x 

E* 

) 

A 

mod  g. 


It  therefore  suffices  to  bound  the  number  of  possible  values  of  the  form  x4  [^f  ]  G  Zm+n.  By  the  triangle 
inequality,  we  have 


||x4  [^t  ]  ||  <  f3\/m  +  n  +  si(E)  •  jVh  -  n  <  \jm  +  n  ■  (/3  +  7  •  si(E)). 

Define  N^r)  to  be  the  number  of  integer  points  in  a  d-dimensional  Euclidean  ball  of  radius  r.  For  r  >  Vd, 
from  the  volume  of  the  ball  and  Stirling’s  approximation,  we  have  IV^r)  =  0(r /\fd)d .  Therefore,  the 
number  of  possible  values  of  the  form  x4  [,*t  ]  G  Zm+n  is  0(/3  +  7  ■  si(E))m+n,  as  claimed. 

For  lossiness,  observe  that  for  our  choice  of  7,  the  base-2  logarithm  of  the  domain  size  of  g\  is 

(m  +  n)  lg /3  +  nlg7c_1  >  (m  +  n)  lg /?  +  D(m). 


Whereas  by  the  above,  for  (3  >  7  •  si(E)  the  base-2  logarithm  of  the  image  size  of  g\  is  at  most 


( m  +  n)  lg  0((3  +  7  •  si(E))  =  (m  +  n)  lg  (3  +  0(m). 


By  choosing  a  sufficiently  large  universal  constant  in  the  above  fl(-)  expression,  we  have  that  the  two 
quantities  above  differ  by  D(m),  as  desired.  | 

We  now  discuss  the  constraints  on  the  parameters  and  show  how  they  can  be  instantiated.  The 
constant  c,  base  b,  and  integer  7  are  chosen  based  on  the  relationship  between  m  and  n.  First,  we 
need  7C_1  >  as  required  by  Lemma  15.41  In  order  to  generate  A  with  a  trapdoor,  we  will  have 

m  =  @(nlogbq)  =  Q(cnlogbq),  so  we  need  7  >  g®(1/1°gfe)'c/(c_1).  For  any  desired  constant  C  >  1,  we  can 
choose  constants  c  >  1  and  b  >  2  so  that  7  <  q1^0 .  Next,  we  choose  (3:  to  accommodate  both  the  upper 
bound  that  suffices  for  invertibility  (Item  [T]  of  Lemma  l5.3p.  and  the  lower  bound  on  /3  that  suffices  for 
fl(m)-lossiness  iLemma  15.41).  it  suffices  to  take 

g1/c-si(E)  <  $  <  q/Q(s  i(R)-Vm).  (20) 


These  constraints  can  be  satisfied  for  sufficiently  large 

^1-1/c  >  fl(si(R)  •  si(E)  •  y/m).  (21) 

In  all  our  instantiations,  we  will  have  (with  1  —  negl(n )  probability)  si(R)  =  poly(n)  by  the  use  of 
the  trapdoor  generation  or  delegation  algorithms,  and  si(E)  =  poly(n)  by  the  use  of  LWE  with  error 
distribution  Di,aq  for  aq  =  0(i/n)  to  generate  a  pseudorandom  matrix  A.  Because  1  —  1/C(>0isa 
constant  (which  may  even  be  chosen  arbitrarily  close  to  1),  we  can  choose  a  sufficiently  large  q  =  poly(n) 
so  as  to  satisfy  Equation  (1^11).  and  can  use  an  error  rate  of  a  =  Q(^/n)/q  =  1/  poly(n). 

Remark  5.5  As  a  concrete  (but  non-identity-based)  instantiation,  consider  a  matrix  A  having  the  form 
described  in  Lemma  15.41  where  A  G  Z”xm  is  uniformly  random  and  the  entries  of  E  are  chosen  inde¬ 
pendently  from  Dit0iq,  where  aq  =  so  that  we  can  invoke  known  worst-case  hardness  results  for 

LWE.  Then  we  have  si(R)  =  0(\/m)  ■  u{\/\ogn)  =  0(y/n)  and  si(E)  =  0(^/mn)  =  0(n )  with  over¬ 
whelming  probability,  by  subgaussianity  of  Dz,aq  and  Lemma f5. II  Moreover,  under  the  LWE  assumption 
(in  dimension  n)  with  noise  rate  a ,  such  an  A  is  indistinguishable  from  uniform,  which  makes  the  lossy 
function  g\  indistinguishable  from  an  invertible  one. 
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Remark  5.6  Our  constructions  of  ID-based  lossy  TDFs  below  involve  two  small  variations  on  the  above 
example.  First,  the  trapdoor  D (id)  for  an  identity  will  be  delegated  (using  the  DelTrap  algorithm)  from 
a  trapdoor  R(id),  derived  from  the  master  trapdoor  R,  for  which  si(R(id))  <  poly(?r).  So  we  will  still 
have  si(D(id))  <  si(R(zd))  •  poly(n)  =  poly(n).  Second,  in  the  lossy  case,  the  hidden  matrix  E  in  the 
structured  matrix  A  will  no  longer  be  Gaussian  itself,  but  will  be  the  product  of  some  Gaussian  E'  (of 
parameter  aq)  and  another  matrix  X  with  si(X)  =  poly(n),  so  we  will  still  have  si(E)  =  poly(n)  and 
can  still  instantiate  all  the  parameters  so  that  q,  1/a  =  poly(n). 


5.3  Our  id-based  lossy  trapdoor  function 

Setup.  As  above,  let  c  >  1  and  integer  base  b  >  2  be  constants  to  be  determined  later,  and  let 
h  =  cn,  m  =  hlogbq  +  to  (log  n),  and  m  =  rh  +  2 hw  where  w  =  [~logbg~|.  For  integer  fj,  >  1,  let 
C  :  IDSp  — >  ZiqXn  x  {0, 1}M  denote  an  injective  encoding  of  identities  that  will  be  instantiated  for  a 
specific  scheme. 

Our  E-IBTDF.  Our  E-IBTDF  L[^t,  IDSp,  C]  is  associated  with  an  integer  /j,  >  1,  an  identity  space  IDSp 
and  an  injective  encoding  C.  It  has  domain  InSp  =  /™+n  x  I™~n  and  auxiliary  input  space  (Z**”)^,  and 
is  given  by  the  following  algorithms. 

1.  Parameters:  Given  input  A  €  Z”xm  and  auxiliary  input  H  =  (H[l], . . . ,  H[/x])  G  (Z^xn)/1,  algorithm 
L[/r,  IDSp,  C].Pg  chooses  R  =  (R[l], . . .  ,  R[/z])  <-  and  lets  U  =  (U[l], . . .  ,U[/z])  € 

(Z nxnwy,  where 

U[i]  :=  H[i]  •  G  —  AR[i], 

It  also  chooses  R'  D™X7™^— and  lets  A'  =  AR'.  It  returns  pars  =  (A,A',U)  as  the  public 
parameters  and  msk  =  (R,  H)  as  the  master  secret  key. 

Note  that  R[i]  is  a  trapdoor  with  tag  H[i]  for  [A  |  U [*] ] .  Moreover,  since  each  R[z]  is  subgaussian 
with  parameter  b  ■  oj(\J log n),  we  have  (by  Lemma  15. 1|)  si(R[i])  =  0(by/m )  •  w(y'logn)  for  all  i,  with 
overwhelming  probability. 

For  pars  =  (A,  A',  U)  and  a  user  identity  id  with  C (id)  =  (H[0],  c  G  {0, 1}M),  define 


A  (id) 


A  |  H[0]  •  G  +  ^c[i]U[i] 

i=  1 


For  U  as  constructed  by  L [p,  IDSp,  C].Pg,  we  have 


A  (id) 


A|(H[0]+^c[*]H[i])-G 

i= 1 


A  -  5ZC[*]R[*] 

i=l 


(22) 


Define 

R(*d)  :=  and  H (id)  :=  H[0]  +  Y 

i  i 

and  note  that  R  (id)  is  a  trapdoor  with  tag  H(*d)  for  A  (id).  Moreover,  by  the  above  bound  on 
si(R[i])  and  the  triangle  inequality,  we  have  si(R(«Z))  =  0(pby/m)  ■  w(\/log n)  =  poly(n)  for  all  id, 
with  overwhelming  probability.  In  what  follows  we  assume  that  this  bound  holds. 

2.  Key  generation:  Given  public  parameters  pars  =  (A,  A',11),  master  secret  (R,  H)  and  identity  id  G 
IDSp  with  C (id)  =  (H[0],c  G  {0,1}^),  algorithm  L[/x,  IDSp,  C].Kg  proceeds  as  follows.  It  computes 
A  (id),  R  (id),  and  H(id)  as  defined  above.  Define 

A' (id)  :=  [A (id)  j  A']. 


If  H (id)  is  invertible,  it  runs  DelTrap(R(id),  A' (id),  H;  =  I,  s)  from  Item[2]of  Lemma  15.31  to  generate 
a  trapdoor  D (id)  with  tag  I  for  A' (id),  for  a  sufficiently  large  s  =  Q(fib2^/m)  ■  w(\/log n)2. 
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proc  Initialize(id)  jj  RLi 


Hi 


Auxi (id)  ;  Win 


true 


D, 


Z  ,a.q 


;  [I]  =  [EIf][I] 


(pars,  msk )  < 
IS  V-  0  ;  id* 
Return  pars 


L[/i,IDSp,C].Pg(A,H1) 

-  id 


proc  Initialize(id)  /  R*  (i  €  {0, 1}) 

Ho  -G-  Auxo(irf)  ;  Hi  G-  Auxi(id)  ;  Win  <—  true 

A  A  Z”x™ 

(pars,  msk)  G-  L[/x,  IDSp,  C].Pg(A,  Hj) 

75  ^  0  ;  id*  <-  id 

Return  pars 


proc  GetDK(id)  /RLi,Ro,Ri 
75  <-  75u{id} 

If  either  Ho  (id)  or  Hi  (id)  is  not  invertible 
then  Win  <—  false  ;  dk  -t—  _L 
Else  dk  -t—  L[/x,  IDSp,  C].Kg(pars,  msk,  id) 
Return  dk 

proc  Ch(id)  /RLi,Ro,Ri 
id*  <-  id 

If  Ho(id)  7^  Ojixri  or  Hi(id)  7^  Ojjxn 
then  Win  «—  false 

proc  Finalize(d/)  //  RLi,Ro,Ri 
Return  ((d'  =  1)  and  (id*  0  75)  and  Win) 


Figure  7:  Games  RLi  ( “Real-to-Losssy” )  and  Ro,  Ri  associated  to  n,  (jl,  IDSp  and  auxiliary  input  generator 
algorithms  Auxq  and  Auxi. 


Note  that  s  =  f 1(6  •  si(R(id)))  •  uj(y/\ogn)  as  required  by  Lemma [5.31  and  that  with  overwhelming 
probability, 

si(D(id))  =  s  ■  0(y/m)  =  0(nb2m)  ■  u}(\J\ogn)2  =  poly(n). 

3.  Evaluate:  Given  public  parameters  pars  =  (A,  A',U),  identity  id  €  IDSp  and  input  x  e  7™+n  x  7 j~n, 
algorithm  L[//,  IDSp,C].Ev  computes  A' (id)  =  [A(id)  |  A']  as  above,  and  outputs  y  =  gA'(id)ix)- 

4.  Invert:  Given  parameters  (A,  ALU)  and  identity  id  €  IDSp  determining  A' (id)  as  above,  trapdoor 

(with  tag  I)  for  A  '(id),  and  value  y  =  ,7a'(mZ)(x)  as  above,  algorithm  L[/z,  IDSp,  C].Ev-1  returns  x 
using  the  inversion  algorithm  from  Item  [1]  of  Lemma  15.31 

Key  generation,  invertibility,  and  lossiness.  The  choice  of  auxiliary  input  H  determines  the 
ability  to  generate  keys  for  identities,  i.e. ,  the  induced  IBTDF  L[/x,  IDSp,  C](H)  can  generate  a  key  Djd 
for  any  id  such  that  H(id)  is  invertible.  By  the  upper  bound  on  /3  from  Equation  (j20|L  inversion  is  correct 
as  long  as  f3  <  q/Q(si(Dld)  ■  y/m). 

By  contrast,  suppose  that  the  A  €  ZglXm  given  to  L[/z,  IDSp,  C].Pg  is  such  that  [_£]  =  [^t]  [j[]  for 
some  A  €  Z™xm  and  =  [E^  |  E^]  €  jjn-n)y.m  x  jjn-n)y,n .  a  is  a  structured  matrix  that  satisfies 

the  hypothesis  of  Lemma  15.411  Then  if  H(id)  =  0,  it  can  be  verified  that  A  (id)  is  such  that 


I m-\-nw 

Im 

I  hw 

I  m 

T 

A  (id) 

In 

_E[  -Ei-R  (id)  E*2_ 

±nw 

A  -A  R (id). 

which  satisfies  the  hypothesis  of  LemmalKdlwith  [A  |  —  A-R(id)]  in  place  of  A  and  Ef  =  [E^  |  — EpR(id)  | 
Ej]  in  place  of  E*.  Observe  that  by  the  triangle  inequality,  «i(E)  <  si(E)(l  +  si(R(id)))  <  si(E)  poly(n). 
In  particular,  if  we  have  a  known  poly(n)  upper  bound  on  si(E),  then  as  described  in  the  analysis 
following  the  proof  of  Lemma  15.41  we  can  instantiate  the  parameters  to  have  correct  inversion  when 
H(id)  is  invertible,  and  Q(m)-lossiness  when  H(id)  =  0. 

In  what  follows  we  show  security  of  the  scheme  in  the  selective-id  and  adaptive  models,  under  the 
LWE  assumption. 
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5.4  Real-to-lossy  lemma 

Consider  game  RLi  which  is  defined  as  in  Figure  El  where  A  is  such  that  [_£]  =  [^t]  [j[]  for  uni¬ 
formly  random  A  £  Z”xm  anq  n) x  (m+n) _  Games  Ro  and  Ri  are  defined  similarly,  where  the 

distribution  of  A  is  uniformly  random. 

The  following  lemma  says  it  is  hard  to  distinguish  game  Ro  from  RLi.  We  will  apply  this  by  defining 
Auxo  and  Auxi  in  such  a  way  that  the  output  of  Auxo  results  in  the  real  scheme  and  the  output  of  Auxi 
results  in  a  lossy  setup. 

Lemma  5.7  Let  n,  fi  >  1  be  integers  and  IDSp.  Let  Auxo  and  Auxi  be  auxiliary  input  generators  for 
L[/x,  IDSp,  C]  and  A  an  adversary.  Then  there  is  an  adversary  B  such  that 

Pr[R^]  -  Pr[RLf]  <  Adv^(R)  +  negl(n)  .  (23) 

The  running  time  of  B  is  that  of  A  plus  some  overhead.  If  A  is  selective-id  then  so  is  B. 

The  last  statement  allows  us  to  use  the  lemma  in  both  the  selective-id  and  adaptive-id  cases. 

Proof:  By  Remark  15.51  we  have  that 

Pr[R^]  -  PrpTL)1]  <  Adv^(B)  .  (24) 

We  claim  that  in  Ro  and  Ri  (where  A  is  uniformly  random)  the  values  Ho  and  Hi  are  statistically  hidden 
from  A’s  view.  By  Lemma  15.21  the  tuple  (A,  AR[1], . . .  ,  AR[/x])  is  negl(n)- far  from  uniformly  random. 
Hence  the  public  parameters  (A,  A',  U)  are  negl(n)-iax  from  uniform  for  any  fixed  choice  of  the  auxiliary 
input  H.  Since  the  execution  of  the  remaining  game  is  independent  of  whether  H  comes  from  Auxo  or 
Auxi,  we  obtain 

Pr[R^]  -  Pr[Rf ]  <  negl(n)  .  (25) 

which  concludes  the  proof.  I 


5.5  Selective-id  Security 


We  consider  IBTDF  L [p  =  1,  Z”  \  {0},  CFRD],  the  instance  of  our  construction  with  identity  space  IDSp  = 
Z”  \  {0},  uniformly  random  input  A  €  Z”xm,  auxiliary  input  Ho  =  Ho[l]  =  —  Cfr.d(0)  €  Z™xn,  and 
identity  encoding  CFRD(id)  =  (Cfrd(*4),  1)  £  Z™xn  x  {0, 1},  where  Cfrd  :  Z”  — >  Z”xn  is  an  “invertible 
differences”  encoding  as  constructed  in  0  (I.e.,  for  each  x  ^  x',  the  matrix  Cfrd(x)  —  Cfrd(x7)  is 

invertible  over  Zg.) 

Note  that  our  scheme  satisfies  the  correct  inversion  requirement  because  Ho  (id)  =  Cfrd  {id)  —  Cfrd  (0) 
is  invertible  for  all  id  £  IDSp  =  Z”  \  {0}.  We  show  that  this  IBTDF  is  selective-id  d-lossy  for  <5  =  1, 
meaning  fully  selective-id  lossy,  and  hence  selective-id  one-way.  To  do  this  we  define  a  sibling  LF[/r  = 
|0},CpRD].  It  preserves  the  key-generation,  evaluation  and  inversion  algorithms  of  L[1,Z”  \ 
{0},  C'FRD]  and  alters  parameter  generation  to 


Algorithm  LF[1,Z£  \  {0},  C'FRD].Pg (id)  : 

A  £  Zf" ;  E-  A  D<r„7)x<»+")  q;  [^«]  [ { ] 

H,  [1]  =  -CFRD(id)  ;  (pars,  msk )  A  L[l,Zj  \  {0},  C'FRD].Pg(A,  H,  )  ;  Return  (pars,  msk )  . 
The  following  says  that  our  IBTDF  is  1-lossy  with  lossiness  f i(m),  under  the  LWE  assumption. 


Theorem  5.8  Let  m  =  C2n  >  cin  =  h  and  l  =  2m.  Let  L  =  L[1,Z”  \  {0},CFRD]  be  the  IBTDF 
associated  by  our  construction  to  parameters  n  =  1  and  IDSp  =  Z”  \  {0}.  Let  LF  =  LF[1,Z”  \ {0} ^ Cfrd] 
be  the  sibling  associated  to  it  as  above.  Let  <5  =  1  and  let  be  A  a  selective-id  adversary.  Then  there  is  an 
adversary  B  such  that 


AdvLLFq(^)  <  Ad vl™(B)  +  negl. 


(26) 
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The  running  time  of  B  is  that  of  A  plus  overhead. 

Proof:  On  input  id,  let  algorithm  Auxo  return  —  Cfrd(O)  and  algorithm  Auxi  return  — Cfrd(*^)-  Let 
Ro,RLi  be  the  games  of  Figure [7]  with  p  =  1,  IDSp  =  Z”\  {0}  and  auxiliary  input  generators  Auxo  and 
Auxi,  respectively.  Then  we  claim 

Pr  [  RealJ4  ]  =  Pr  [  Rj/  ]  and  Pr  [  Lossy L.LFy  ]  =  Pr  [  Rhf  ]  .  (27) 

To  justify  this  let  id*  be  the  identity  queried  by  A  to  both  Initialize  and  Ch.  (These  queries  are  the 
same  because  A  is  selective- id.)  Then  Hi  =  —  Cfrd(mP)  so  Hi  (id)  =  Cfrd(*c?)  —  Cfrd^cP).  Since  Cfrd 
is  an  encoding  with  invertible  differences,  this  is  invertible  iff  id  id*.  This  means  that  the  conjunct 
(id*  0  IS)  A  Win  is  always  true.  The  claim  of  Equation  (E 71)  is  now  true  because  game  Ro  generates 
parameters  with  uniform  A  and  auxiliary  input  Ho  =  — Cfrd(O)  €  Z”xn  that,  via  L[1,Z™  \  {0},C'FRD], 
defines  L.  However  game  RLi  generates  parameters  with  auxiliary  input  Hi.  Since  Hi(ieP)  =  0,  the 
function  gA'(id)  is  fl(m)-lossy,  as  argued  immediately  following  the  description  of  the  scheme.  | 


5.6  Full  Security 


We  consider  IBTDF  L[/x,  {0, 1}M,  C7] ,  the  instance  of  our  construction  with  IDSp  =  {0, 1}M,  uniformly 
random  input  A  €  ZgXm,  auxiliary  input  Hq  =  (Ho[l], . . . ,  Ho[/z])  :=  (Ofixh,  ■  ■  ■ ,  O^xn)  and  C'(id)  = 
(lhxn,Cf(id)),  where  C f  :  {0, 1}M  — >  Z™xn  maps  x  e  {0, 1}M  into  a  vector  X  of  matrices  such  that 
X[i]  =  (— 1)*W  •  1  €  Zf 

Note  that  our  scheme  satisfies  the  correct  inversion  requirement  because  Ho(id)  =  lflXn  is  invertible 
for  all  id  €  IDSp.  We  show  that  this  IBTDF  is  adaptive-id  <5-lossy  for  5  =  (SQ)^1  where  Q  is  the  number 
of  key-derivation  queries  of  the  adversary.  By  Theorem  13.21  this  means  L[p,  {0, 1}M,  C]  is  adaptive-id 
one-way.  To  do  this  we  define  a  sibling  \Sq[p,  {0, 1}M,  C'].  It  preserves  the  key-generation,  evaluation 
and  inversion  algorithms  of  L [p,  {0, 1}M,  C  f]  and  alters  parameter  generation  to 


Algorithm  LFQ[p,  {0, 1}^,  Cj.Pg(id)  : 

A  A  z-xrra  ;  E*  A  M*ln)x(fii+n)  ;  [l]  =  [E:t]  [{] 

L[/i,{0,ir,C'].Pg(A,Hi) 


,aq 

Hi  A  Auxi  ;  (pars,  msk ) 


Return  (pars,  msk)  . 


where  Auxi  is  a  randomized  algorithm  from  pi  m  that  generates  Hi  €  (Z”xn)M  such  that  the  image  of 
Hi  (•)  is  either  Ofixn  or  invertible  and  Hi(-)  is  “pairwise  independent”,  i.e,  for  all  id  7^  id' ,  PrAUXi[Hi(fd)  = 
0 fixh  I  Hi  (id')  =  Ojixn]  =  1/(2 Q).  The  following  says  that  our  IBTDF  is  d-lossy  under  the  LWE 
assumption  with  lossiness  I  =  2m. 


Theorem  5.9  Let  m  =  C2n  >  c\n  =  h  and  I  =  2m.  Let  L  =  L[/r,{0,  be  the  IBTDF  associated 

by  our  construction  to  parameters  n  and  IDSp  =  {0, 1}M.  Let  A  be  an  adaptive-id  adversary  that  makes 
a  maximal  number  of  Q  queries  and  let  §  =  (8 Q)~l  ■  Let  LF  =  LF q\/i,  (0, 1}M,  C']  be  the  sibling  associated 
to  L  as  above.  Then  there  is  an  adversary  B  such  that 

AdvUL0F,£(^)  <  AdvnWa(^)  +  negl(n)  .  (28) 

The  running  time  of  B  is  that  of  A  plus  polynomial  overhead. 

Proof:  (Sketch)  Let  Q  be  the  number  of  queries  made  by  A  and  let  algorithm  Aux  be  defined  as  above. 
Let  Ro,  RLi  be  the  games  of  Figure |7]with  IDSp  =  (0, 1}M  and  this  Auxo  and  Auxi.  Let  E (IS,  id*)  denote 
the  event  that  when  Finalize(cf)  is  called  in  Rj/  the  flag  Win  false  is  set  and  id*  0  IS.  (Note  that 
rj(IS,id*)  only  depends  on  IS,  id*.)  Let  rj(IS,id*)  be  the  probability  that  E (IS,  id*)  happens.  In  p], 
it  was  shown  that  Aiow  :=  ^  <  r](IS,  id*)  <  ^  :=  Aup.  Since  R^  and  Realj4  are  only  different  when 
E (IS,  id*)  happens,  one  would  like  to  argue  that  Aiow  •  Pr  [Realj/]  =  Pr  [ R,q  ]  but  this  is  not  true  since 
E (IS,  id*)  and  Realj/  may  not  be  independent.  To  get  rid  of  this  unwanted  dependence  we  consider  a 
modification  of  Rq  and  RLi  which  adds  some  artificial  abort  such  that  in  total  it  always  sets  Win  false 
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with  probability  around  1  —  Aiow,  independent  of  the  view  of  the  adversary.  (Since,  given  IS,  id*,  the 
exact  value  of  rj(IS,id*)  cannot  be  computed  efficiently,  it  needs  to  be  approximated  using  sampling.) 
Concretely,  games  Ro  and  RLi  are  defined  as  Ro  and  RLi,  respectively,  the  only  difference  being  Finalize 
which  is  defined  as  follows. 

proc  Finalize(d')  /Ro,RLi 

Compute  an  approximation  rf(IS,  id*)  of  r](IS,  id*) 

If  rf{IS,  id*)  >  Aiow  then  set  Win  <—  false  with  probability  1  —  Ai ow/r]'(IS,  id*) 

Return  ((d'  =  1)  and  (id*  0  IS)  and  Win) 


One  can  again  show  that  with  a  polynomial  number  of  samples  to  compute  approximation  rf  (IS,  id*), 


5  •  Pr  [  RealJ4  ]  =  Pr 


(29) 


where  <5  =  Aiow/2  is  as  in  the  theorem  statement.  Similar  to  the  proof  of  Theorem  15.81  we  can  show  that 


Pr  [Lossy^p^]  =  Pr 


RLi 


(30) 


Now  Equation  (|28l)  follows  from  Equation  (fT|),  Equation  (l29l).  Equation  (1301)  and  Lemma [5771  | 
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A  Anonymous  IBE 

In  this  section  we  describe  an  IBE  scheme  that  is  similar  to  I B  E  from  Section  [4]  with  the  difference  that 
it  encrypts  group  elements  (rather  than  bits)  and  it  is  slightly  more  efficient.  We  associate  to  any  integer 
fi  >  1  and  any  identity  space  IDSp  C  Zp  an  IBE  scheme  I B E7 [yU,  IDSp]  that  has  message  space  Gy  and 
algorithms  as  follows: 

1.  Parameters:  Algorithm  IBE[/i,  IDSp].Pg  lets  gf-G*;  t,z  4-  Z*  ;  g  gl  ;  Z  e(g,g)z.  It  then  lets 

H,  H,  U  A  G  ;  U  A  GM+1.  It  returns  pars  =  (g,  g,  H,U,  H ,  U ,  Z)  as  the  public  parameters  and 
msk  =  ( t,z )  as  the  master  secret  key. 

2.  Key  generation:  Given  parameters  (g,  g,  H,U,XJ ,  Z),  master  secret  (t,z)  and  identity  id  €  IDSp,  al¬ 
gorithm  IBE'[/i,  IDSp]. Kg  returns  decryption  key  (D\,  D2,  D3,  D4)  computed  by  letting  r,f  f-Zp  and 
setting 

Dl^gz  ■  R(U,  id)tr  •  H#  ;  D2^Ur  •  Hf  ;  D3  «-  g~tr  ;  D4  «-  g~tf  . 

3.  Encryption:  Given  parameters  (g,g,  H,  U,  U,  Z),  identity  id  €  IDSp  and  message  M  €  Gy,  algorithm 
IBE[/x,  IDSp].Enc  returns  ciphertext  (C\,  C2,  C3,  C4,  C5)  computed  as  follows.  It  lets  s,sf-  Zp  and 

Ci  <-  gs ;  C2  <r-  gs  ;  C3  <-  H{ U,  id)s  ■  Us ;  CA  ^  Hs+S ;  C5  <-  •  M  . 

4.  Decryption:  Given  parameters  (g,  g,  H,U,XJ ,  Z),  identity  id  €  IDSp,  decryption  key  (D3,  D2,  D4,  D3) 
for  id  and  ciphertext  (Ci,  C2,  C3,  C4,  C5),  algorithm  IBE[/x,  IDSp]. Dec  returns 

M  =  e(Gi,  D1)e(C2,  D2)e(C3,  D3)e(C4,  D4)C5  . 

Compared  to  IBE[/r,  IDSp]  from  Section  [4] ,  the  efficiency  improvement  consists  of  replacing  id)  by 

U  in  the  computation  of  D2  and  C3  and  of  setting  H  :=  H .  Using  the  techniques  of  the  ciphertext 
pseudorandomness  lemma  (Lemmal4.il)  one  can  show  that  the  elements  (Ci,  C2,  C3,  C4)  of  the  ciphertext 
are  pseudorandom.  (Here  the  reduction  knows  the  secret  z.)  In  a  final  similar  hybrid  step  one  can  also 
show  that,  under  the  Bilinear  Diffie-Hellman  assumption  (which  is  implied  by  the  DLIN  assumption),  the 
element  C5  is  also  pseudorandom.  (Here  is  reduction  knows  the  secret  t.)  As  our  main  ID-based  TDF 
result  uses  anonymous  IBE  techniques,  the  main  ideas  of  this  systems  security  is  implicit  in  our  main 
proof.  A  formal  proof  of  the  above  stand  alone  system  is  deferred  to  the  full  version. 

B  Applications 

We  expand  first  on  the  application  to  achieving  deterministic  IBE  and  then  on  achieving  hedged  IBE. 
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D-PKE.  Deterministic  PKE  (D-PKE)  cannot  achieve  IND-CPA  security.  Bellare,  Boldyreva  and 
O’Neill  [7]  defined  a  target  notion  PRIV  for  it  that  captures  the  best  possible  security  under  the  condition 
that  encryption  is  deterministic.  D-PKE  provides  a  way  to  do  fast  (logarithmic  time)  search  on  encrypted 
data.  PEKS  [20]  offers  higher  security  but  takes  linear  time,  and  trading  some  security  for  a  significant 
increase  in  searching  speed  is  attractive  for  large  databases. 

Achieving  PRIV  for  D-PKE  has  been  (and  remains)  a  challenge.  It  is  possible  in  the  RO  model  [7]. 
The  best  results  without  ROs  are  due  to  Boldyreva,  Fehr  and  O’Neill  [16],  who  show  how  to  achieve 
PRIV  without  random  oracles  for  message  sequences  which  are  blocksources,  meaning  each  message  has 
some  min-entropy  even  given  the  previous  ones.  Using  the  Leftover  Hash  Lemma  (LHL)  [15,  [39] ,  they 
show  that  any  LTDF  is  a  D-PKE  scheme  that  is  PRIV-secure  for  blocksources  as  long  as  the  lossy  branch 
is  a  universal  hash  function. 

D-IBE.  We  introduce  deterministic  IBE  (D-IBE).  The  PRIV  definition  is  easily  extended  to  this  setting. 
D-IBE  offers,  over  D-PKE,  the  same  advantages  that  IBE  offers  over  PKE,  for  example  that  there  are  no 
certificates  and  encryption  depends  only  on  the  identity  of  the  receiver.  Again,  D-IBE  can  be  achieved 
in  the  RO  model  by  setting  the  coins  of  an  IBE  scheme  to  the  RO-hash  of  the  message.  (This  is  how 
PKE  is  turned  into  D-PKE  in  the  RO  model  in  mm-)  We  ask  what  can  be  done  without  ROs. 

We  show  that  our  constructions  of  DLIN-based  lossy  IB-TDFs  have  the  properties  necessary  to  obtain 
PRIV-secure  D-IBE  schemes  for  blocksources  under  the  paradigm  of  fTK|  in  the  selective  case.  We  start 
by  observing  that  the  lossy  branches  are  universal  hash  functions.  This  can  be  seen  from  Equations  ([3]) , 
©,  ©  and  ©.  In  the  lossy  case,  /( y,  id)  =  0,  and  the  function  has  a  range  R  of  size  p 2 .  Now  if  xi,X2 
are  distinct  inputs,  then  the  outputs  of  the  function  on  them  collide  exactly  when  ((s,  xi),  (s,  aq))  = 
((s,  £2),  (s,^))-  The  probability  that  this  happens  when  s,  s  are  chosen  at  random  from  Z™  is  1/p2  = 

Hedged  IBE.  The  definitions  and  methods  of  [8]  can  be  extended  to  the  identity-based  setting  in 
a  straightforward  way  in  the  selective  setting  once  we  have  universal  lossy  IB-TDFs.  There  are  two 
approaches.  One  is  generic  composition  of  an  IBE  scheme  with  a  IB-TDF.  The  other  is  to  first  pad  the 
message  with  randomness  and  then  apply  the  IB-TDF. 

Adaptive  setting.  It  remains  open  to  achieve  deterministic  or  hedged  IBE  in  the  adaptive  security 
setting. 
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Abstract 

Recent  advances  in  lattice  cryptography,  mainly  stemming  from  the  development  of  ring-based 
primitives  such  as  ring-LWE,  have  made  it  possible  to  design  cryptographic  schemes  whose  efficiency  is 
competitive  with  that  of  more  traditional  number-theoretic  ones,  along  with  entirely  new  applications 
like  fully  homomorphic  encryption.  Unfortunately,  realizing  the  full  potential  of  ring-based  cryptography 
has  so  far  been  hindered  by  a  lack  of  practical  algorithms  and  analytical  tools  for  working  in  this  context. 
As  a  result,  most  previous  works  have  focused  on  very  special  classes  of  rings  such  as  power-of-two 
cyclotomics,  which  significantly  restricts  the  possible  applications. 

We  bridge  this  gap  by  introducing  a  toolkit  of  fast,  modular  algorithms  and  analytical  techniques  that 
can  be  used  in  a  wide  variety  of  ring-based  cryptographic  applications,  particularly  those  built  around 
ring-LWE.  Our  techniques  yield  applications  that  work  in  arbitrary  cyclotomic  rings,  with  no  loss  in  their 
underlying  worst-case  hardness  guarantees,  and  very  little  loss  in  computational  efficiency,  relative  to 
power-of-two  cyclotomics.  To  demonstrate  the  toolkit’s  applicability,  we  develop  a  few  illustrative  appli¬ 
cations:  two  variant  public-key  cryptosystems,  and  a  “somewhat  homomorphic”  symmetric  encryption 
scheme.  Both  apply  to  arbitrary  cyclotomics,  have  tight  parameters,  and  very  efficient  implementations. 


1  Introduction 


The  past  few  years  have  seen  many  exciting  developments  in  lattice-based  cryptography.  Two  such  trends 
are  the  development  of  schemes  whose  efficiency  is  competitive  with  traditional  number-theoretic  ones 
(e.g.,  HMic021  and  follow-ups),  and  the  breakthrough  work  of  Gentry  HGen09bl  lGen09all  (followed  by 
others)  on  fully  homomorphic  encryption.  While  these  two  research  threads  currently  occupy  opposite 
ends  of  the  efficiency  spectrum,  they  are  united  by  their  use  of  algebraically  structured  ideal  lattices  arising 
from  polynomial  rings.  The  most  efficient  and  advanced  systems  in  both  categories  rely  on  the  ring-LWE 
problem  BLPR10L  an  analogue  of  the  standard  learning  with  errors  problem  [Reg05[.  Informally  (and  a 
bit  inaccurately),  in  a  ring  R  =  Z [X]/(f(X))  for  monic  irreducible  f(X)  of  degree  n,  and  for  an  integer 
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modulus  q  defining  the  quotient  ring  Rq  :=  R/qR  =  Z q[X]/(f(X)),  the  ring-LWE  problem  is  to  distinguish 
pairs  (a,i,  bi  =  a*  •  s  +  er )  E  Rq  x  Rq  from  uniformly  random  pairs,  where  s  £  Rq  is  a  random  secret  (which 
stays  fixed  over  all  pairs),  the  a*  E  /ri;  are  uniformly  random  and  independent,  and  the  error  (or  “noise”) 
terms  e*  E  i?  are  independent  and  “short.” 

In  all  applications  of  ring-LWE,  and  particularly  those  related  to  homomorphic  encryption,  a  main 
technical  challenge  is  to  control  the  sizes  of  the  noise  terms  when  manipulating  ring-LWE  samples  under 
addition,  multiplication,  and  other  operations.  For  correct  decryption,  q  must  be  chosen  large  enough  so  that 
the  final  accumulated  error  terms  do  not  “wrap  around”  modulo  q  and  cause  decryption  error.  On  the  other 
hand,  the  error  rate  (roughly,  the  ratio  of  the  noise  magnitude  to  the  modulus  q)  of  the  original  published 
ring-LWE  samples  and  the  dimension  n  trade  off  to  determine  the  theoretical  and  concrete  hardness  of 
the  ring-LWE  problem.  Tighter  control  of  the  noise  growth  therefore  allows  for  a  larger  initial  error  rate, 
which  permits  a  smaller  modulus  q  and  dimension  n,  which  leads  to  smaller  keys  and  ciphertexts,  and  faster 
operations  for  a  given  level  of  security. 

Regarding  the  choice  of  ring,  the  class  of  cyclotomic  rings  R  =  Z[A']/<hm(X),  where  <hm(A')  is 
the  mth  cyclotomic  polynomial  (which  has  degree  n  =  <p(m)  and  is  monic  and  irreducible  over  the 
rationals),  has  many  attractive  features  that  have  proved  very  useful  in  cryptography.  For  example,  the 
search/decision  equivalence  for  ring-LWE  in  arbitrary  cyclotomics  IILPR101  relies  on  their  special  algebraic 
properties,  as  do  many  recent  works  that  aim  for  more  efficient  fully  homomorphic  encryption  schemes 
(e.g.,  BSVllllBGV121lGHS12al  lGHS12bllGHPS12H).  In  particular ,  power- of -two  cyclotomics,  i.e.,  where 
the  index  m  =  2k  for  some  k  >  1,  are  especially  nice  to  work  with,  because  (among  other  reasons) 
n  =  m/2  is  also  a  power  of  two,  <[>„,  ( X )  =  Xn  +  1  is  maximally  sparse,  and  polynomial  arithmetic  modulo 
<3?m(X)  can  be  performed  very  efficiently  using  just  a  slight  tweak  of  the  classical  n-dimensional  FFT 
(see,  e.g.,  ILMPR081).  Indeed,  power-of-two  cyclotomics  have  become  the  dominant  and  preferred  class  of 
rings  in  almost  all  recent  ring-based  cryptographic  schemes  (e.g.,  II.MPR08 .  LM08 1  |Lvu09[ lGen09bl iGen  1 01 
ILPRI OllSSTTllBVl  lbirBGVl2llGHSI2a'llGHS12bl[4ml2llBPR12IIMP12llGLPT2llGHPS12l).  often  to  the 


exclusion  of  all  other  rings. 

While  power-of-two  cyclotomic  rings  are  very  convenient  to  use,  there  are  several  reasons  why  it  is 
essential  to  consider  other  cyclotomics  as  well.  The  most  obvious,  practical  reason  is  that  powers  of  two  are 
sparsely  distributed,  and  the  desired  concrete  security  level  for  an  application  may  call  for  a  ring  dimension 
much  smaller  than  the  next-largest  power  of  two.  So  restricting  to  powers  of  two  could  lead  to  key  sizes  and 
runtimes  that  are  at  least  twice  as  large  as  necessary.  A  more  fundamental  reason  is  that  certain  applications, 
such  as  the  above-mentioned  works  that  aim  for  more  efficient  (fully)  homomorphic  encryption,  require 
the  use  of  non-power-of-two  cyclotomic  rings.  This  is  because  power-of-two  cyclotomics  lack  the  requisite 
algebraic  properties  needed  to  implement  features  like  SIMD  operations  on  “packed”  ciphertexts,  or  plaintext 
spaces  isomorphic  to  finite  fields  of  characteristic  two  (other  than  IF 2  itself).  A  final  important  reason  is 
diversification  of  security  assumptions.  While  some  results  are  known  1GHPS121  that  relate  ring-LWE  in 
cyclotomic  rings  when  one  index  m  divides  the  other,  no  other  connections  appear  to  be  known.  So  while  we 
might  conjecture  that  ring-LWE  and  ideal  lattice  problems  are  hard  in  every  cyclotomic  ring  (of  sufficiently 
high  dimension),  some  rings  might  turn  out  to  be  significantly  easier  than  others. 

Unfortunately,  working  in  non-power-of-two  cyclotomics  is  rather  delicate,  and  the  current  state  of 
affairs  is  unsatisfactory  in  several  ways.  Unlike  the  special  case  where  m  is  a  power  of  two,  in  general  the 
cyclotomic  polynomial  can  be  quite  “irregular”  and  dense,  with  large  coefficients.  While  in  principle, 

polynomial  arithmetic  modulo  fi>m(X)  can  still  be  done  in  0(n  log  n)  scalar  operations  (on  high-precision 
complex  numbers),  the  generic  algorithms  for  achieving  this  are  rather  complex  and  hard  to  implement,  with 
large  constants  hidden  by  the  O(-)  notation. 
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Geometrically,  the  non-power-of-two  case  is  even  more  problematic.  If  one  views  Z[X]/($m(X))  as 
the  set  of  polynomial  residues  of  the  form  ao  +  a\X  +  •  •  •  +  a„_iXn_1,  and  uses  the  naive  “coefficient 
embedding”  that  views  them  as  vectors  (ao,  «i , . . . ,  an_ i)  G  Z"  to  define  geometric  quantities  like  the 
lo  norm,  then  both  the  concrete  and  theoretical  security  of  cryptographic  schemes  depend  heavily  on  the 
form  of  <J>m(2f).  This  stems  directly  from  the  fact  that  multiplying  two  polynomials  with  small  norms  can 
result  in  a  polynomial  residue  having  a  much  larger  norm.  The  growth  can  be  quantified  by  the  “expansion 
factor”  IILM06II  of  <bm(2f),  which  unfortunately  can  be  very  large,  up  to  in  the  case  of  highly 

composite  m  [  Erd46|.  Later  works  UGHS 1 2al  circumvented  such  large  expansion  by  using  tricks  like  lifting 
to  the  larger-dimensional  ring  Z[X]/ (Xm  —  1),  but  this  still  involves  a  significant  loss  in  the  tolerable  noise 
rates  as  compared  with  the  power-of-two  case. 

In  IPR07. 1  .PR  10 1  a  different  geometric  approach  was  used,  which  avoided  any  dependence  on  the  form 
of  the  polynomial  modulus  (I> rn(X).  In  these  works,  the  norm  of  a  ring  element  is  instead  defined  according  to 
its  canonical  embedding  into  Cn,  a  classical  concept  from  algebraic  number  theory.  This  gives  a  much  better 
way  of  analyzing  expansion,  since  both  addition  and  multiplication  in  the  canonical  embedding  are  simply 
coordinate-wise.  Working  with  the  canonical  embedding,  however,  introduces  a  variety  of  practical  issues, 
such  as  how  to  efficiently  generate  short  noise  terms  having  appropriate  distributions  over  the  ring.  More 
generally,  the  focus  of  liLPRlOH  was  on  giving  an  abstract  mathematical  definition  of  ring-LWE  and  proving 
its  hardness  under  worst-case  ideal  lattice  assumptions;  in  particular,  it  did  not  deal  with  issues  related  to 
practical  efficiency,  bounding  noise  growth,  or  designing  applications  in  non-power-of-two  cyclotomics. 


1.1  Contributions 


Our  main  contribution  is  a  toolkit  of  modular  algorithms  and  analytical  techniques  that  can  be  used  in  a  wide 
variety  of  ring-based  cryptographic  applications,  particularly  those  built  around  ring-LWE.  The  high-level 
summary  is  that  using  our  techniques,  one  can  design  applications  to  work  in  arbitrary  cyclotomic  rings,  with 
no  loss  in  their  underlying  worst-case  hardness  guarantees,  and  very  little  loss  in  computational  efficiency, 
relative  to  the  best  known  techniques  in  power-of-two  cyclotomics.  In  fact,  our  analytical  techniques  even 
improve  the  state  of  the  art  for  the  power-of-two  case. 

In  more  detail,  our  toolkit  includes  fast,  specialized  algorithms  for  all  the  main  cryptographic  operations 
in  arbitrary  cyclotomic  rings.  Among  others,  these  include:  addition,  multiplication,  and  conversions  among 
various  useful  representations  of  ring  elements;  generation  of  noise  terms  under  probability  distributions 
that  guarantee  both  worst-case  and  concrete  hardness;  and  decoding  of  noise  terms  as  needed  in  decryption 
and  related  operations.  Our  algorithms’  efficiency  and  quality  guarantees  stem  primarily  from  our  use  of 
simple  but  non-obvious  representations  of  ring  elements,  which  differ  from  their  naive  representations  as 


polynomial  residues  modulo  <bm(X).  (See  the  second  part  of  Section  1.2  for  more  details.)  On  the  analytical 
side,  we  give  tools  for  tightly  bounding  noise  growth  under  operations  like  addition,  multiplication,  and 
round-off/discretization.  (Recall  that  noise  growth  is  the  main  factor  determining  an  application’s  parameters 
and  noise  rates,  and  hence  its  key  sizes,  efficiency,  and  concrete  security.) 

Some  attractive  features  of  the  toolkit  include: 


•  All  the  algorithms  for  arbitrary  cyclotomics  are  simple,  modular,  and  highly  parallel,  and  work  by 
elementary  reductions  to  the  (very  simple)  prime-index  case.  In  particular,  they  do  not  require  any 
polynomial  reductions  modulo  <bm(X)  -  in  fact,  they  never  need  to  compute  (\>rn(X)  at  all!  The 
algorithms  work  entirely  on  vectors  of  dimension  n  =  <p(m),  and  run  in  0(n  log  n)  or  even  0(nd) 
scalar  operations  (with  small  hidden  constants),  where  d  is  the  number  of  distinct  primes  dividing  m. 
With  the  exception  of  continuous  noise  generation,  all  scalar  operations  are  low  precision,  i.e.,  they 
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involve  small  integers.  In  summary,  the  algorithms  are  very  amenable  to  practical  implementation. 
(Indeed,  we  have  implemented  all  the  algorithms  from  scratch,  which  will  be  described  in  a  separate 
work.) 


Our  algorithm  for  decoding  noise,  used  primarily  in  decryption,  is  fast  (requiring  0(n  log  to)  or  fewer 
small-integer  operations)  and  correctly  recovers  from  optimally  large  noise  rates.  (See  the  last  part 
of  Section  1.2  for  details.)  This  improves  upon  prior  techniques,  which  in  general  have  worse  noise 
tolerance  by  anywhere  between  m/2  and  super-polynomial  rA' 1  )  factors,  and  are  computationally 
slower  and  more  complex  due  to  polynomial  reduction  modulo  among  other  operations. 


•  Our  bounds  on  noise  growth  under  ring  addition  and  multiplication  are  exactly  the  same  in  all 
cyclotomic  rings;  no  ring-dependent  “expansion  factor”  is  incurred.  (For  discretizing  continuous  noise 
distributions,  our  bounds  are  the  same  up  to  very  small  1  +  o(l)  factors,  depending  on  the  primes 
dividing  m.)  This  allows  applications  to  use  essentially  the  same  underlying  noise  rate  as  a  function 
of  the  ring  dimension  to,  and  hence  be  based  on  the  same  worst-case  approximation  factors,  for  all 
cyclotomics.  Moreover,  our  bounds  improve  upon  the  state  of  the  art  even  for  power-of-two  cyclotomics: 
e.g.,  our  (average-case,  high  probability)  expansion  bound  for  ring  multiplication  improves  upon  the 
(worst-case)  expansion- factor  bound  by  almost  a  y/n  factor. 


To  illustrate  the  toolkit’s  applicability,  in  Section[8]we  develop  the  following  illustrative  applications: 

1.  A  simple  adaptation  of  the  “dual”  LWE-based  public -key  cryptosystem  of  IGPV081.  which  can  serve 


as  a  foundation  for  (hierarchical)  identity-based  encryption.  (See  Section  8.1  ) 


2.  An  efficient  and  compact  public-key  cryptosystem,  which  is  essentially  the  “two  element”  system 
outlined  in  jLPRlOil.  but  generalized  to  arbitrary  cyclotomics,  and  with  tight  parameters.  (See  Sec¬ 
tion!! 


1.2) 


3. 


A  “somewhat  homomorphic”  symmetric  encryption  scheme,  which  follows  the  template  of  the 
Brakerski-Vaikuntanathan  HBVllal  and  Brakerski-Gentry-Vaikuntanathan  BBGV121  schemes  in  power- 
of-two  cyclotomics,  but  generalized  to  arbitrary  cyclotomics  and  with  much  tighter  noise  analysis.  This 
application  exercises  all  the  various  parts  of  the  toolkit  more  fully,  especially  in  its  modulus-reduction 


and  key-switching  procedures.  (See  Section  8.3  ) 


A  final  contribution  of  independent  interest  is  a  new  “regularity  lemma”  for  arbitrary  cyclotomics,  i.e., 
a  bound  on  the  smoothing  parameter  of  random  g-ary  lattices  over  the  ring.  Such  a  lemma  is  needed  for 
porting  many  applications  of  standard  LWE  (and  the  related  “short  integer  solution”  SIS  problem)  to  the  ring 
setting,  including  SIS-based  signature  schemes  EGPV08llCHKP10i  BoylO[lMP12l.  the  “primal”  [Reg05|  and 
“dual”  HGPVOSa  LWE  cryptosystems  (as  in  Section  8.1 1,  chosen  ciphertext-secure  encryption  schemes  llPei()9i 
IMP  121.  and  (hierarchical)  identity-based  encryption  schemes  HGPV08I  ICHKPlOl  lABBlOll.  In  terms  of 
generality  and  parameters,  our  lemma  essentially  subsumes  a  prior  one  of  Micciancio  BMic02fl  for  the  ring 
h[X]/{Xn  —  1),  and  an  independent  one  of  Stehle  et  al.  [SSTX09]  for  power-of-two  cyclotomics.  See 
Section|7]for  further  discussion. 

Following  the  preliminary  publication  of  this  work,  our  toolkit  has  also  been  used  centrally  in  the 
“ring-switching”  technique  for  homomorphic  encryption  I Cil  IPS  121.  and  to  give  efficient  “bootstrapping” 
algorithms  for  fully  homomorphic  encryption  IAP131. 
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1.2  Techniques 


The  tools  we  develop  in  this  work  involve  several  novel  applications  of  classical  notions  from  algebraic 
number  theory.  In  summary,  our  results  make  central  use  of:  (1)  the  canonical  embedding  of  a  number  field, 
which  endows  the  field  (and  its  subrings)  with  a  nice  and  easy-to-analyze  geometry;  (2)  the  decomposition  of 
arbitrary  cyclotomics  into  the  tensor  product  of  prime -power  cyclotomics,  which  yields  both  simpler  and 
faster  algorithms  for  computing  in  the  field,  as  well  as  geometrically  nicer  bases;  and  (3)  the  “dual”  ideal  Rv 
and  its  “decoding”  basis  d,  for  fast  noise  generation  and  optimal  noise  tolerance  in  decryption  and  related 
operations.  We  elaborate  on  each  of  these  next. 

The  canonical  embedding.  As  in  the  previous  works  liPR071lLPR10l.  our  analysis  relies  heavily  on  using 
the  canonical  embedding  a:  K  — >  Cn  (rather  than,  say,  the  naive  coefficient  embedding)  for  defining 
all  geometric  quantities,  such  as  Euclidean  norms  and  inner  products.  For  example,  under  the  canonical 
embedding,  the  “expansion”  incurred  when  multiplying  by  an  element  a  E  K  is  characterized  exactly  by 
|rr(a) ||oc:  its  foe  norm  under  the  canonical  embedding;  no  (worst-case)  ring-dependent  “expansion  factor” 
is  needed.  So  in  the  average-case  setting,  where  the  multiplicands  are  random  elements  from  natural  noise 
distributions,  for  each  multiplication  we  get  at  least  a  Cl(yfn)  factor  improvement  over  using  the  expansion 
factor  in  all  cyclotomics  (including  those  with  power-of-two  index),  and  up  to  a  super-polynomial 
factor  improvement  in  cyclotomics  having  highly  composite  indices.  In  our  analysis  of  the  noise  tolerance  of 
decryption,  we  also  get  an  additional  factor  savings  over  more  simplistic  analyses  that  only  use  norm 

information,  by  using  the  notion  of  subgaussian  random  variables.  These  behave  under  linear  transformations 
in  essentially  the  same  way  as  Gaussians  do,  and  have  Gaussian  tails.  (Prior  works  that  use  subgaussianity  in 
lattice  cryptography  include  I A  POT  IMP  121 .) 

Tensorial  decomposition.  An  important  fact  at  the  heart  of  this  work  is  that  the  mth  cyclotomic  number 
field  I\  =  © ( Qm )  =  Q[X]/($m(X))  may  instead  be  viewed  as  (i.e.,  is  isomorphic  to)  the  tensor  product  of 
prime-power  cyclotomics: 

K  ~  Ki  =  >  Cm2  >  •  ■  ■)> 

where  m  =  ©[,  m(.  is  the  prime -power  factorization  of  m  and  Kp  =  ©(©„©.  Equivalently,  in  terms  of 
polynomials  we  may  view  K  as  the  multivariate  field 

K  *  Q[Xi,  X2,  •  •  .]/(<&mi(X i),  $m2(X2), . . .),  (1.1) 

where  there  is  one  indeterminant  Xp  and  modulus  <l)rrlf  (Xp)  per  prime-power  divisor  of  m.  Similar  decompo¬ 
sitions  hold  for  the  ring  of  integers  R  =  rL[X]/^m(X)  and  other  important  objects  in  K,  such  as  the  dual 
ideal  IVJ  (described  below). 

Adopting  the  polynomial  interpretation  of  K  from  Equation  d  1 . 1[)  for  concreteness,  notice  that  a  natural 
©-basis  is  the  set  of  multinomials  Yip  Xjl  for  each  choice  of  0  <  jg  <  p(rrip').  We  call  this  set  the 
“powerful”  basis  of  K  (and  of  R).  Interestingly,  for  non-prime -power  m,  under  the  field  isomorphism  with 
©[2f]/(<&m(2f))  that  maps  each  Xp  — >  Xm,/rnf: .  the  powerful  basis  does  not  coincide  with  the  standard 
“power”  basis  1,  X,  X2, . . . ,  X‘^rr'  i~ 1  usually  used  to  represent  the  univariate  field.  It  turns  out  that  in 
general,  the  powerful  basis  has  much  nicer  computational  and  geometric  properties  than  the  power  basis,  as 
we  outline  next. 

Computationally,  the  tensorial  decomposition  of  K  (with  the  powerful  basis)  allows  us  to  modularly 
reduce  operations  in  K  (or  R,  or  powers  of  RJ)  to  their  counterparts  in  much  simpler  prime-power  cyclo¬ 
tomics  (which  themselves  easily  reduce  to  the  prime-index  case).  We  can  therefore  completely  avoid  all  the 
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many  algorithmic  complications  associated  with  working  with  polynomials  modulo  <1>TO(X).  In  particular, 
we  obtain  novel,  simple  and  fast  algorithms,  similar  to  the  FFT,  for  converting  between  the  multivariate 
“polynomial”  representation  (i.e.,  the  powerful  basis)  and  the  “evaluation”  or  “Chinese  remainder”  representa¬ 
tion,  in  which  addition  and  multiplication  are  essentially  linear  time.  Similarly,  we  obtain  linear-time  (or 
nearly  so)  algorithms  for  switching  between  the  polynomial  representation  and  the  “decoding”  representation 
used  in  decryption  (described  below),  and  for  generating  noise  terms  in  the  decoding  representation.  A  final 
advantage  of  the  tensorial  representation  is  that  it  yields  trivial  linear-time  algorithms  for  computing  the  trace 
function  to  cyclotomic  subfields  of  K. 

The  tensorial  representation  also  comes  with  important  geometrical  advantages.  In  particular,  under 
the  canonical  embedding  the  powerful  basis  is  better-conditioned  than  the  power  basis,  i.e.,  the  ratio  of  its 
maximal  and  minimal  singular  values  can  be  much  smaller.  This  turns  out  to  be  important  when  bounding 
the  additional  error  introduced  when  discretizing  (rounding  off)  field  elements  in  noise-generation  and 
modulus-reduction  algorithms,  among  others. 

The  dual  ideal  It'  and  its  decoding  basis.  Under  the  canonical  embedding,  the  cyclotomic  ring  R  of 
index  m  embeds  as  a  lattice  which,  unlike  Z”,  is  in  general  not  self-dual.  Instead,  its  dual  lattice  corresponds 
to  a  fractional  ideal  Rv  C  K  satisfying  R  C  Rv  C  m~1 2 3R,  where  the  latter  inclusion  is  nearly  an  equality. 
(In  fact,  Rw  is  a  scaling  of  R  exactly  when  m  is  a  power  of  two,  in  which  case  R  =  (rn/2)It' .)  In  liLPRIOIl 
it  is  shown  that  the  “right”  definition  of  the  ring-LWE  distribution,  which  arises  naturally  from  the  worst-case 
to  average-case  reduction,  involves  the  dual  ideal  Rw:  the  secret  belongs  to  the  quotient  R^  =  It' /qlt'  (or 
just  RR),  and  ring-LWE  samples  are  of  the  form  (a,b  =  a  ■  s  +  e  mod  qlt')  for  uniformly  random  a  E  Rq 
and  eiTor  e  which  is  essentially  spherical  in  the  canonical  embedding. 

While  it  is  possible  HDD  121  to  simplify  the  ring-LWE  distribution  by  replacing  every  instance  of  Rv 
with  R,  while  retaining  essentially  spherical  error  (but  scaled  up  by  about  m,  corresponding  to  the  approximate 
ratio  of  It  to  /iv),  in  this  work  we  show  that  it  is  actually  advantageous  to  retain  It'  and  expose  it  in 
applications^ j]  The  reason  is  that  in  general,  It'  supports  correct  bounded-distance  decoding — which  is  the 
main  operation  performed  in  decryption — under  a  larger  error  rate  than  It  does 0  In  fact,  the  error  tolerance 
of  Rv  is  optimal  for  the  simple,  fast  lattice  decoding  algorithm  used  implicitly  in  essentially  all  decryption 
procedures,  namely  Bahai’s  “round-off”  algorithm  I  BahS5 1.  The  reason  is  that  when  decoding  a  lattice  A 
using  some  basis  {b, },  the  error  tolerance  depends  inversely  on  the  Euclidean  lengths  of  the  vectors  dual 
to  {bj}.  For  It' ,  there  is  a  particular  “ decoding ”  basis  whose  dual  basis  is  optimally  short  (relative  to  the 
determinant  of  It),  whereas  for  It  no  such  basis  exists  in  general]^]  In  fact,  the  decoding  basis  of  It'  is  simply 
the  dual  of  the  (conjugate  of  the)  powerful  basis  described  above! 

In  addition  to  its  optimal  error  tolerance,  we  also  show  that  the  decoding  basis  has  good  computational 
properties.  In  particular,  there  are  linear-time  (or  nearly  so)  algorithms  for  converting  to  the  decoding  basis 
from  the  other  bases  of  It  '  or  It  '  that  are  more  appropriate  for  other  computational  tasks.  And  Gaussian 
errors,  especially  spherical  ones,  can  be  sampled  in  essentially  linear  time  in  the  decoding  basis. 

'This  is  unless  m  is  a  power  of  two,  in  which  case  nothing  is  lost  by  simply  scaling  up  by  exactly  m/ 2  to  replace  Rv  with  R. 

2By  “error  rate”  here  we  mean  the  ratio  of  the  error  (in,  say,  £2  norm)  to  the  dimension-normalized  determinant  detjA)1/”  of  the 
lattice  A,  so  exact  scaling  has  no  effect  on  the  error  rate. 

3We  note  that  decoding  by  “lifting”  R  to  the  larger-dimensional  ring  Z[A']/(Xm  —  1),  as  done  in  I G FT.S  1 2al],  still  leads  to  at 
least  an  m/2  factor  loss  in  error  tolerance  overall,  because  some  inherent  loss  is  already  incurred  when  replacing  i?v  with  R,  and  a 
bit  more  is  lost  in  the  lifting  procedure. 
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Notation  Description  See 

m,n  =  m  The  cyclotomic  index,  a  positive  integer  having  prime -power  factorization 

m  =  n (,m£,  so  that  n  =  Wi'-pitnf).  Also,  m  =  m/2  if  m  is  even, 
otherwise  m  =  m. 

K  =  Q(Cm)  The  mth  cyclotomic  number  field,  where  Qrn  denotes  an  abstract  element  § 

=  Q[A']/(4>m(A'))  having  order  m  over  Q.  (Here  <bm(A)  E  Z[A]  is  the  mth  cyclotomic 
—  (&£  Q(Cme)  polynomial,  the  minimal  polynomial  of  which  has  degree  n.)  It  is 

best  viewed  as  the  tensor  product  of  the  cyclotomic  subfields  Q((me). 

2.5.1 

o :  K  — y  Cn  The  canonical  embedding  of  K,  which  endows  K  with  a  geometry,  e.g.,  § 

o  2  :=  <7 (a)  2  for  a  E  K.  Both  addition  and  multiplication  in  K 

correspond  to  their  coordinate- wise  counterparts  in  Cn,  yielding  tight 
bounds  on  “expansion”  under  ring  operations. 

2.5.2 

R  =  Z  [/„,]  The  ring  of  integers  of  K.  It  is  best  viewed  as  a  tensor  product  of  subrings  § 

=  Z[X]/($m(X))  Rt  =  Z[{mt]. 

=  (^Z  [Cmt] 

2.5.3 

The  dual  fractional  ideal  of  R,  generated  by  t^1  =  g/m,  so  R  C  Rf  C  § 
g,t  E  R  m~1R.  Each  of  R  J ,  g,  and  t  can  be  seen  as  the  tensor  products  of  their 

counterparts  in  the  subfields 

2.5.4 

p  C  R  The  “ powerful ”  Z-basis  of  R,  defined  as  the  tensor  product  of  the  power 

Z-bases  of  each  Z[£mJ.  For  non-prime-power  m,  it  differs  from  the 
power  Z-basis  { Cm  ■  Ch,  •■  ■  ■  •  Cm-1}  °ften  used  to  represent  Z[£m]>  and 
has  better  computational  and  geometric  properties. 

c  C  Ra  The  “ Chinese  remainder”  (CRT)  Z„-basis  of  Ra  =  R/qR,  for  any  prime  §2.5.5 

q  =  1  mod  m.  It  yields  linear-time  addition  and  multiplication  in  Rq,  §|5] 
and  there  is  an  0(n  log  n)-time  algorithm  for  converting  between  c  and  p 
(as  a  Z^-basis  of  Rq). 

d  C  Rv  The  “ decoding ”  Z-basis  of  Rf ,  defined  as  the  dual  of  the  (conjugate  ijh 

of  the)  powerful  basis  p.  It  is  used  for  optimal  decoding  of  R'J  and  its 
powers,  and  for  efficiently  sampling  Gaussians. 

Figure  1:  Dramatis  Personce. 


1.3  Organization 

We  draw  the  reader’s  attention  to  Figure  [T]  which  provides  a  glossary  of  the  main  algebraic  objects  and 
notation  used  in  this  work,  and  pointers  to  further  discussion  of  their  properties.  The  rest  of  the  paper  is 
organized  as  follows: 


Section|2|  Covers  background  on  our  (unusual,  but  useful)  notation  for  vectors,  matrices  and  tensors; 
Gaussian  and  subgaussian  random  variables;  lattices  and  basic  decoding/discretization  algorithms; 
algebraic  number  theory;  and  ring-LWE.  For  the  reader  with  some  background  in  algebraic  number 
theory,  we  draw  attention  to  the  lesser-known  material  in  Section  2.5.1  on  the  tensorial  decomposition 
into  prime -power  cyclotomics,  and  Section  2.5.4  on  duality  (/?v,  dual  bases,  etc.). 
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Section  [3]  Recalls  a  “sparse  decomposition”  of  the  discrete  Fourier  transform  (DFT)  matrix,  and  develops  a 
novel  sparse  decomposition  for  a  closely  related  one  that  we  call  the  “Chinese  remainder  transform,” 
which  plays  a  central  role  in  many  of  our  fast  algorithms. 

Section [4]  Defines  the  “powerful”  Z-basis  p  of  R  and  describes  its  algebraic  and  geometric  properties. 

Section H]  Defines  the  “Chinese  remainder”  Z,;-basis  c  of  Rq,  gives  its  connection  to  the  powerful  basis,  and 
describes  how  it  enables  fast  ring  operations. 

Section [6]  Defines  the  “decoding”  basis  d  of  i?v,  gives  its  connection  to  the  powerful  basis,  describes  how  it 
is  used  for  decoding  with  optimal  noise  tolerance,  and  shows  how  to  efficiently  generate  (continuous) 
Gaussians  as  represented  in  the  decoding  basis. 

Section [7j  Gives  a  regularity  lemma  for  random  lattices  over  arbitrary  cyclotomics.  This  is  needed  for  only 
one  of  our  applications,  as  well  as  for  adapting  prior  signature  schemes  and  LWE-based  (hierarchical) 
identity-based  encryption  schemes  to  the  ring  setting. 

Section [8|  Gives  some  applications  of  the  toolkit:  two  basic  public -key  encryption  schemes,  and  a  “somewhat 
homomorphic”  symmetric-key  encryption  scheme. 

Acknowledgments.  We  thank  Markus  Puschel  for  his  help  with  the  sparse  decomposition  of  the  “Chinese 
remainder  transform,”  and  Damien  Stehle  for  useful  discussions. 

2  Preliminaries 

For  a  positive  integer  k,  we  let  [k]  denote  the  set  {0. ....  k  —  1}.  For  any  a  E  M/Z,  we  let  [fb|  E  M  denote 
the  unique  representative  a  E  (a  +  Z)  n  [—1/2, 1/2).  Similarly,  for  a  E  Zq  =  'LfqL  we  let  [a]  denote  the 
unique  representative  a  E  (a  +  qL)  H  [—q/2,  q/ 2).  We  extend  [•]  entrywise  to  vectors  and  matrices.  The 
radical  of  a  positive  integer  m,  denoted  rad(m),  is  the  product  of  all  primes  dividing  m. 

For  a  vector  x  over  M  or  C,  define  the  1 2  norm  as  ||x||2  =  (Xnl;r*|2)1^2’  an^  the  norm  as  [[x^  = 
max, \xj\.  For  an  n-by-n  matrix  M  we  denote  by  s\(M)  its  largest  singular  value  (also  known  as  the  spectral 
or  operator  norm),  and  by  sn(M)  its  smallest  singular  value. 

2.1  Vectors,  Matrices,  and  Tensors 

Throughout  this  paper,  the  entries  of  a  vector  over  a  domain  D  are  always  indexed  (in  no  particular  order) 
by  some  finite  set  S,  and  we  write  Ds  to  denote  the  set  of  all  such  vectors.  When  the  domain  is  7Lq  or  a 
subset  of  the  complex  numbers,  we  usually  denote  vectors  using  bold  lower-case  letters  (e.g.,  a),  otherwise 
we  use  arrow  notation  (e.g.,  a).  Similarly,  the  rows  and  columns  of  an  “ R-by-C  matrix”  over  D  are  indexed 
by  some  finite  sets  R  and  C,  respectively.  We  write  DRxC  for  the  set  of  all  such  matrices,  and  typically 
use  upper-case  letters  to  denote  individual  matrices  (e.g.,  A).  The  R-by-R  identity  matrix  Ip  has  1  as  its 
(i,  i )th  entry  for  each  i  E  II,  and  0  elsewhere.  All  the  standard  matrix  and  vector  operations  are  defined  in  the 
natural  way,  for  objects  having  compatible  domains  and  index  sets. 

In  particular,  the  Kronecker  (or  tensor)  product  M  =  A  ®  B  of  an  R^-by-Co  matrix  A  with  an  Il\-by-C\ 
matrix  B  is  the  (Rq  x  f?i)-by-(Co  x  C\)  matrix  M  with  entries  =  Al0j0  ■  The 

Kronecker  product  of  two  vectors,  or  of  a  matrix  with  a  vector,  is  defined  similarly.  For  positive  integers 
no,  ni,  we  often  implicitly  identify  the  index  set  [no]  x  [ni]  with  [noni],  using  the  bijective  correspondence 
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(io,  i\)  i  =  i{)Ti\  +  i  i ;  note  that  this  matches  the  traditional  Kronecker  product  for  ordered  rows  and 
columns.  Similarly,  when  m  =  mt  for  a  set  of  pairwise  coprime  positive  integers  rri(,  we  often  identify 
the  index  sets  Z*(  and  Z*(/  via  the  bijection  induced  by  the  Chinese  remainder  theorem.  In  other  settings 
we  reindex  a  set  using  another  correspondence,  which  will  be  described  in  context. 

An  important  fact  about  the  Kronecker  product  is  the  mixed-product  property.  (A  0  B)(C  0  D)  = 
{AC)  ®  ( BD ).  Using  the  mixed-product  property,  a  tensor  product  A  =  (x),  ,4/;  of  several  matrices  can  be 
written  as 

^  =  II(J0"'<8)  J®Ae®I®---®I),  (2.1) 

t 

where  the  identity  matrices  have  the  appropriate  induced  index  sets.  In  particular,  if  each  An  is  a  square 
matrix  of  dimension  ri£,  then  A  is  square  of  dimension  n  =  \\e  rif  ,  and  multiplication  by  A  reduces  to  n/ri£ 
parallel  multiplications  by  An,  in  sequence  for  each  value  of  i  (in  any  order). 


2.2  The  Space  H 

When  working  with  cyclotomic  number  fields  and  ideal  lattices  under  the  canonical  embedding  (see  Sec¬ 


tion 


2.5.2  below),  it  is  convenient  to  use  a  subspace  H  C  C 

H  =  {x  G  C" 


(for  some  integer  m  >  2),  defined  as 

•  tCi  —  Xm—i ,  V  i  G  Zm}. 


Letting  n  =  pirn),  it  is  not  difficult  to  verify  that  H  (with  the  inner  product  induced  on  it  by  CZm)  is 
isomorphic  to  RM  as  an  inner  product  space.  For  rn  =  2  this  is  trivial,  and  for  m  >  2  this  can  be  seen  via 
the  IAm-by-[n]  unitary  basis  matrix  B  =  ^  ^  v^|//  j  of  //,  where  the  Z*rindcxcd  rows  are  shown  in 
increasing  order  according  to  their  representatives  in  {1 , ,m  —  1},  the  [n] -indexed  columns  are  shown  in 
increasing  order  by  index,  I  is  the  identity  matrix,  and  J  is  the  reversal  matrix  (obtained  by  reversing  the 
columns  of  I). 

We  equip  H  with  the  £2  and  i ^  norms  induced  on  it  from  Cz™.  Namely,  for  x  e  H  we  have  ||x||2  = 

Ei(N2)1/2  =  v/(x,x),and  Hx^  =  maxj|aii|. 


Gram-Schmidt  orthogonalization.  For  an  ordered  set  B  =  {b;  c  H  of  linearly  independent 

vectors,  the  Gram-Schmidt  orthogonalization  B  =  { h.j }  is  defined  iteratively  as  follows:  bo  =  bo,  and  for 
j  =  1,2 , ,n  —  1,  bj  is  the  component  of  bj  orthogonal  to  the  linear  span  of  bo, ... ,  bj-i : 

bj  =  bj  -  bfc  •  (bj,bfc)/(bfc,bfc). 

fceb'] 

Viewing  B  as  a  matrix  whose  columns  are  the  vectors  bj ,  its  orthogonalization  corresponds  to  the  unique 
factorization  B  =  QDU,  where  Q  is  unitary  with  columns  bj/||bj||2;  I)  is  real  diagonal  with  positive 
diagonal  entries  ||bj  ||2  >  0;  and  U  is  real  upper  unitriangular  with  entries  Wkj  =  (bj,  b*.} / (b/,,  b/,.)  0The 
Gram-Schmidt  orthogonalization  is  B  =  QD,  and  so  B  =  BU.  The  real  positive  definite  Gram  matrix  of  B 
is  B*B  =  UT D2U.  Because  U  is  upper  unitriangular,  this  is  exactly  the  Cholesky  decomposition  of  B*B, 
which  is  unique;  it  therefore  determines  the  matrices  D ,  U  in  the  Gram-Schmidt  orthogonalization  of  B.  One 
can  also  verify  from  the  definitions  that  D 2  and  U  are  both  rational  if  the  Gram  matrix  is  rational. 

4This  is  often  referred  to  as  the  “QR”  factorization,  though  here  we  have  also  factored  out  the  diagonal  entries  of  the  upper- 
triangular  matrix  R  into  D,  making  U  unitriangular. 
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2.3  Gaussians  and  Subgaussian  Random  Variables 

For  s  ^ >  0,  define  the  Gaussian  function  ps  ;  H  y  (0, 1]  us  ps (x)  —  exp( — n ^x,  x)/s2)  —  exp( — 7r 1 1 x 1 1 2/s^ ) ■ 
By  normalizing  this  function  we  obtain  the  continuous  Gaussian  probability  distribution  Ds  of  parameter  s, 
whose  density  is  given  by  s~n  ■  ps(x). 

For  much  of  our  analysis  it  is  convenient  to  use  the  standard  notion  of  subgaussian  random  variables, 
relaxed  slightly  as  in  IMP  121.  (For  further  details  and  full  proofs,  see,  e.g.,  llVerl  ill.)  For  any  6  >  0.  we  say 
that  a  random  variable  X  (or  its  distribution)  over  M  is  5-subgaussian  with  parameter  s  >  0  if  for  all  t  E  R, 
the  (scaled)  moment-generating  function  satisfies 

E[exp(27riX)]  <  exp(<5)  •  exp(7rs2t2). 

Notice  that  the  exp(7rs2f2)  term  on  the  right  is  exactly  the  (scaled)  moment-generating  function  of  the 
one-dimensional  Gaussian  distribution  of  parameter  s  over  M.  It  is  easy  to  see  that  if  X  is  (5-subgaussian  with 
parameter  s,  then  cX  is  (5-subgaussian  with  parameter  |c|s  for  any  real  c.  In  addition,  by  Markov’s  inequality, 
the  tails  of  X  are  dominated  by  those  of  a  Gaussian  of  parameter  s,  i.e.,  for  all  t  >  0, 

Pr[|X|  >  t]  <  2exp(<5  —  nt2/s2).  (2.2) 

Using  the  inequality  cosh(x)  <  exp(x2/2),  it  can  be  shown  that  any  /i-bounded  centered  random  variable  X 
(i.e.,  E[X]  =  0  and  \X\  <  B  always)  is  0-subgaussian  with  parameter  B\/2n. 

The  sum  of  independent  subgaussian  variables  is  easily  seen  to  be  subgaussian.  Here  we  observe  that  the 
same  holds  even  in  a  martingale-like  setting. 

Claim  2.1.  Let  5i ,  sr  >  0  and  Xi  be  random  variables  for  i  =  1 . . . . .  k.  Suppose  that  for  every  i,  when 
conditioning  on  any  values  of  X \ , . . . ,  X,_i  ,  the  random  variable  Xi  is  5i-subgaussian  with  parameter 
Then  Xi  is  5 i)- subgaussian  with  parameter  s2)1//2. 


Proof  It  suffices  to  prove  the  claim  for  k  =  2;  the  general  case  follows  by  induction,  since  X/,.  is  subgaussian 
conditioned  on  any  value  of  ffi=}  V,.  Indeed, 


E[exp(27rf(Xi  +X2))] 


Exx 


exp(27rfXi)  Ex2  [exp(27rfX2)  |  Xi] 


<  exp((5i  +  82)  exp(7r(sf  +  s|)t2). 

□ 


We  also  have  the  following  bound  on  the  tail  of  a  sum  of  squares  of  independent  subgaussian  variables. 
Lemma  2.2.  Let  X  be  a  5-subgaussian  random  variable  with  parameter  s.  Then,  for  any  t  E  (0,  l/(2.s2)), 

/1  \  -1 

E[exp(27rfX2)]  <  1  +  2  exp((5)  | 


1 

2^2 


- 1 


Moreover,  if  X  \ , . . . ,  X^  are  random  variables,  each  of  which  is  5-subgaussian  with  parameter  s  conditioned 
on  any  values  of  the  previous  ones,  then  for  any  r  >  k's2 /tr  where  k!  =  2k  exp((5)  we  have  that 


Pr 


[£ 


Xf  >  r 


<  exp  (  k'  (  2 


irr  \  1/2  nr 


k's2) 


k's 2 


-  1 


In  particular,  using  the  inequality  2a1/2  —  a  —  1  <  —a/4  valid  for  all  a  >  4 ,  we  obtain  that  for  any 
r  >  4k' s2  jn, 


Pr 


E 


Xf>r 


<  exp 


nr  \ 
~4 s*)' 
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Proof.  Using  integration  by  parts  and  (|2.2[), 


E  [exp(27rtW2)]  =1+  /  Pr[|JT|  >  r\  ■  Airtr  exp(2ntr2)dr 

Jo 

r  exp(— irr2  /  s2  +  2ntr2)dr 
=  1  +  2eXpm(U_-1)  ^ 

<exp(2exp(i)(U_-l)  '), 

where  the  last  equality  uses  that  for  every  a  >  0,  J0°°  r  exp(— or2)dr  =  (2a)-1.  This  completes  the  first  part 
of  the  lemma.  For  the  second  paid,  notice  that  by  the  above,  if  X\, . . . ,  Xp.  are  as  in  the  statement,  we  have 
for  any  t  E  (0,  l/(2s2)), 


<  1  +  8irt  exp(<5) 


E 


exp(27rf  £  X‘l )j  <  exp^2A;  exp(<5) 


2fs2 


-  1 


-i 


and  hence  by  Markov’s  inequality,  for  all  r  >  0  and  t  E  (0,  l/(2s2)), 


Pr  X'l  >  r  <  expf2fcexp(<5) 


1 


2  ts2 


-i 


—  1  —  2ntr  . 


Letting  x  =  2s2t  E  (0, 1)  and  A  =  irr/(s2k')  >  1,  the  expression  inside  the  exponent  is 

2A:exp((5)^- —  1^  —  Ax^j  . 

The  lemma  follows  using  the  fact  that  for  any  A  >  1,  the  minimum  over  x  E  (0, 1)  of  the  expression  inside 
the  parenthesis  is  2\fA  —  A  —  1  (obtained  at  1  —  1/ \/)4).  Q 


We  extend  the  notion  of  subgaussianity  to  random  vectors  in  Mn  (or  equivalently,  in  //).  Specifically, 
we  say  that  a  random  vector  X  in  Mrt  is  5-subgaussian  with  parameter  s  if  for  all  unit  vectors  u  E  Mn,  the 
random  variable  ( X ,  u)  is  ci-subgaussian  with  parameter  s.  It  follows  from  Claim  2.1  that  if  the  coordinates 
of  a  random  vector  in  Mn  are  independent,  and  each  is  5-subgaussian  with  parameter  s,  then  the  random 
vector  is  nd-subgaussian  with  the  same  parameter  s. 

Sums  of  subgaussian  random  vectors  are  again  easily  seen  to  be  subgaussian,  even  in  the  martingale 
setting  as  in  Claim [2TT| above.  We  summarize  this  in  the  following  corollary,  which  considers  the  more  general 
setting  in  which  we  apply  a  (possibly  different)  linear  transformation  to  each  subgaussian  random  vector. 


Corollary  2.3.  Let  5i,Si  >  0  and  X,  be  random  vectors  in  Mn  ( or  in  H ),  and  let  Ai  be  n  x  n  matrices 
for  i  =  l .....  k.  Suppose  that  for  every  i,  when  conditioning  on  any  values  of  X  i, . . . ,  Xj_i,  the  random 
vector  Xj  is  5 subgaussian  with  parameter  s*.  Then  ff  AiXj  is  (^T  6 f)- subgaussian  with  parameter 
A max(Yl  sjAiAj)1/2,  where  \max  denotes  the  largest  eigenvalue. 


Proof.  For  any  vector  u  E  Mn, 

02  AXi,  u>  =  Y,(AXi,  u>  =  Y'iXi,  aJ  u), 

iii 
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which  is  a  sum  of  random  variables  satisfying  that  for  each  i,  the  ith  variable  is  ©-subgaussian  with  parameter 
Si||A^u||2  conditioned  on  any  value  of  the  previous  ones.  By  Claim 
parameter 


whose  maximum  over  all  unit  vectors  u  is  AmaxQ>?^f)1/2.  ft 


2.1 


this  sum  is  (]C  <5j)-subgaussian  with 


By  applying  Corollary  |2 . 3 1  with  the  linear  transformation  induced  by  coordinate- wise  multiplication  in 
H  C  we  obtain  the  following. 

Claim  2.4.  If  X  is  a  5-subgaussian  with  parameter  s  in  H,  and  z  £  H  is  any  element,  then  the  coordinate- 
wise  multiplication  z  0  X  G  H  is  5-subgaussian  with  parameter  Hzljoo  •  s.  More  generally,  if  Xj  G  H 
are  random  vectors  satisfying  the  property  in  Corollary  \2.3\for  some  5j ,  Sj  >  0  ( respectively ),  then  for  any 
z j  G  H,  we  have  that  A  T  z j  ©  Xj  G  H  is  ( A  '  5j  j -subgaussian  with  parameter  .(Ej»?l(zj)il2)1/2. 


2.4  Lattice  Background 

We  define  a  lattice  as  a  discrete  additive  subgroup  of  H.  We  deal  here  exclusively  with  full-rank  lattices, 
which  are  generated  as  the  set  of  all  integer  linear  combinations  of  some  set  of  n  linearly  independent  basis 
vectors  B  =  { b? }  C  // : 

A  =  £(B)  =  {j2jzjbj  :  Zj  G  z}. 

Two  bases  B ,  If  generate  the  same  lattice  if  and  only  if  there  exists  a  unimodular  matrix  U  (i.e.,  integer 
matrix  with  determinant  ±1)  such  that  BU  =  B' .  The  determinant  of  a  lattice  C(B)  is  defined  as  det (B)\, 
which  is  independent  of  the  choice  of  basis  B.  The  minimum  distance  X\ (A)  of  a  lattice  A  (in  the  Euclidean 
norm)  is  the  length  of  a  shortest  nonzero  lattice  vector:  Ai(A)  =  miiio^xeA  llxll2- 

The  dual  lattice  of  A  C  H  is  defined  as  Av  =  {y  G  H  :  V  x  G  A,  (x,  y)  =  xiVi  £  ^}-  Notice  that 
this  is  actually  the  complex  conjugate  of  the  dual  lattice  as  usually  defined  in  Cn;  our  definition  corresponds 
more  naturally  to  the  notion  of  duality  in  algebraic  number  theory  (see  Section[2.5.4[).  All  of  the  properties  of 
the  dual  lattice  that  we  use  also  hold  for  the  conjugate  dual.  In  particular,  det(Av)  is  det  (A)"  1 . 

It  is  easy  to  see  that  (Av)v  =  A.  If  B  =  {bj}  C  H  is  a  set  of  linearly  independent  vectors  (i.e.,  an 
M-basis  of  H),  its  dual  basis  D  =  { d; }  is  characterized  by  (b;.  d/.)  =  5jk,  where  5jk  is  the  Kronecker  delta. 
It  is  easy  to  verify  that  C{D)  =  C(B)V . 

Micciancio  and  Regev  HMR04I  introduced  a  lattice  quantity  called  the  smoothing  parameter,  and  related 
it  to  various  lattice  quantities. 

Definition  2.5.  For  a  lattice  A  and  positive  real  e  >  0,  the  smoothing  parameter  //£  (A)  is  the  smallest  s  such 
that  p1/s(Av\{0})  <  e. 

Lemma  2.6  (IMR04,  Lemma  3.2]).  For  any  n-dimensional  lattice  A,  we  have  r/2- 2n(A)  <  y/n/Ai(Av)J^] 


Lemma  2.7  ([RegOSj  Claim  3.8]).  For  any  lattice  A,  real  e  >  0  and  s  >  r/£(A),  and  c  G  H,  we  have 
ps( A  +  c)  G  [1  ±  e]  •  sn  det(A)-1. 


5Note  that  we  are  using  e  =  2  2ri  instead  of  2  "  as  in  (MR04I.  but  the  stronger  bound  holds  by  the  same  proof. 
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For  a  lattice  coset  A  +  c  and  real  s  >  0,  define  the  discrete  Gaussian  probability  distribution  over  A  +  c 
with  parameter  s  as 

£>a+c,s(x)  =  VxeA  +  c.  (2.3) 

ps(A  +  c) 

It  is  known  to  satisfy  the  following  concentration  bound. 

Lemma  2.8  (IBan93,  Lemma  1.5(i)]).  For  any  n- dimensional  lattice  A  and  s  >  0,  a  point  sampled  from 
D  \  .s  has  Euclidean  norm  at  most  sffin,  except  with  probability  at  most  ‘A  1,1 . 

Gentry,  Peikert,  and  Vaikuntanathan  HGPV08M  showed  how  to  efficiently  sample  from  a  discrete  Gaussian, 
using  any  lattice  basis  consisting  of  sufficiently  short  orthogonalized  vectors. 

Lemma  2.9  (HGPV081  Theorem  4.1]).  There  is  an  efficient  algorithm  that  samples  to  within  negl(n)  statis¬ 
tical  distance  of  /2\+c..s,  given  c  E  H,  a  basis  B  of  A,  and  a  parameter  s  >  maxy  ||b;  |  •  uj{y/\ogn),  where 
B  =  {  b  ? }  is  the  Gram-Schmidt  ortho gonalization  of  B. 

We  make  a  few  remarks  on  the  implementation  of  the  algorithm  from  Lemma  [T9j  It  is  a  randomized 
variant  of  Bahai’s  “nearest  plane”  algorithm  llBab85l  (a  related  variant  was  also  considered  by  Klein  I'KIcOOl 
for  a  different  problem).  On  input  c  E  H,  and  a  basis  B  and  parameter  s  satisfying  the  above  constraint,  it  does 
the  following:  for  j  =  n  —  1, . . . ,  0,  let  c  -E-  c  —  Zjbj,  where  z3  -t—  c7-  +  D%_p  s  for  c7-  =  (c,  b7) / (b;.  b;) 

J  ji  3  J 

and  Sj  =  s/||bj||2.  Output  the  final  value  of  c. 

In  practice,  the  above  algorithm  is  usually  invoked  on  a  fixed  basis  B  whose  Gram  matrix  B*B  is 
rational.  It  is  best  implemented  by  precomputing  the  rational  matrices  D 2 ,  U  associated  with  B  and  B*  B 
(see  Section |T2|),  and  by  representing  the  input  and  intermediate  values  c  using  rational  coefficient  vectors 
with  respect  to  B.  Then  each  value  c'-  =  (c,  hj ) / (b? ,  b? )  can  be  computed  simply  as  the  inner  product  of  c’s 
coefficient  vector  with  the  jth  row  of  U. 

2.4.1  Decoding 

In  many  applications  we  need  to  perform  the  following  algorithmic  task,  which  is  essentially  a  bounded- 
distance  decoding.  Let  A  be  a  known  fixed  lattice,  and  let  x  E  H  be  an  unknown  short  vector.  The  goal  is  to 
recover  x,  given  t  =  x  mod  A.  Although  there  are  several  possible  algorithms  for  this  task,  here  we  focus 
on  a  slight  extension  of  the  so-called  “round-off”  algorithm  originally  due  to  Babai  BBab851.  This  is  due  to 
its  high  efficiency  and  because  for  our  purposes  it  performs  optimally  (or  nearly  so).  The  algorithm  is  very 
simple:  let  {v/}  be  a  fixed  set  of  n  linearly  independent  (and  typically  short)  vectors  in  the  dual  lattice  Av. 
Denote  the  dual  basis  of  {v,}  by  {b(},  and  let  A'  D  A  be  the  superlattice  generated  by  { b, } .  Given  an  input 
t  =  x  mod  A,  we  express  t  mod  A7  in  the  basis  {bj}  as  Cjbj,  where  c*  E  M/Z  (so  q  =  (x,  Vj)  mod  1), 
and  output  E  H. 

Claim  2.10.  Let  A  C  H  be  a  lattice,  let  { v, }  C  Av  be  a  set  of  n  linearly  independent  vectors  in  its  dual,  and 
let  { b, }  C  A  denote  the  dual  basis  of  { v, }.  The  above  round-off  algorithm,  given  input  x  mod  A,  outputs  x 
if  and  only  if  all  the  coefficients  ai  =  (x.  v))  £  M  in  the  expansion  x  =  a,;b,  are  in  [—1/2, 1/2). 

We  remark  that  in  Babai’s  round-off  algorithm  one  often  assumes  that  { v, }  is  a  basis  of  Av  (and  hence 
{bj}  is  a  basis  of  A),  whereas  here  we  consider  the  more  general  case  where  { v, }  can  be  an  arbitrary  set  of 
linearly  independent  vectors  in  Av.  For  some  lattices  (including  those  appearing  in  our  applications)  this  can 
make  a  big  difference.  Consider  for  instance  the  lattice  of  all  points  in  Zn  whose  coordinates  sum  to  an  even 
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number.  The  dual  of  this  lattice  is  Zn  U  (Zn  +  (1, . . . ,  l)/2),  and  clearly  any  basis  of  this  dual  must  contain 
a  vector  of  length  at  least  y/n/2.  As  a  result,  when  limited  to  using  a  basis,  the  round-off  algorithm  can  fail 
for  vectors  of  length  greater  than  1  / yfn.  However,  the  dual  lattice  clearly  has  a  set  of  n  linearly  independent 
vectors  of  length  1,  allowing  us  to  decode  up  to  length  1/2. 


2.4.2  Discretization 


We  now  consider  another  algorithmic  task  related  to  the  one  in  the  previous  subsection.  This  task  shows  up  in 
applications,  such  as  when  converting  a  continuous  Gaussian  into  a  discrete  Gaussian-like  distribution.  Given 
a  lattice  A  =  C(B)  represented  by  a  “good”  basis  B  =  {bj},  a  point  x  E  H,  and  a  point  c  £  H  representing 
a  lattice  coset  A  +  c,  the  goal  is  to  discretize  x  to  a  point  y  E  A  +  c,  written  y  +—  [x]  A+c,  so  that  the  length 
(or  subgaussian  parameter)  of  y  —  x  is  not  too  large.  To  do  this,  we  sample  a  relatively  short  offset  vector  f 
from  the  coset  A  +  c7  =  A  +  (c  —  x)  in  one  of  a  few  natural  ways  described  below,  and  output  y  =  x  +  f . 
We  require  that  the  method  used  to  choose  f  be  efficient  and  depend  only  on  the  desired  coset  A  +  c7,  not  on 
the  particular-  representative  used  to  specify  it;  we  call  such  a  procedure  (or  the  induced  discretization)  valid. 

Note  that  for  a  valid  discretization,  [z  +  x]  A+c  and  z  +  x]  A .  are  identically  distributed  for  any  z  E  A. 
Therefore,  for  any  sublattice  A'  C  A,  a  valid  discretization  also  induces  a  well-defined  discretization  from 
any  coset  x  =  A7  +  xtoy  =  x  +  f  =  A7  +  y,  where  yG  A  +  c. 

There  arc  several  valid  ways  of  sampling  f ,  offering  tradeoffs  between  efficiency  and  output  guarantees: 


A  particularly  simple  and  efficient  method  is  “coordinate-wise  randomized  rounding:”  given  a  coset 
A  +  c7,  we  represent  c7  in  the  basis  B  as  c7  =  ///  a,b,  mod  A  for  some  coefficients  a*  E  [0, 1),  then 
randomly  and  independently  choose  each  fi  from  {a*  —  1 ,  a, }  to  have  expectation  zero,  and  output 
f  =  /jb,;  G  A  +  c7.  The  validity  of  this  procedure  is  immediate,  since  any  representative  of  A  +  c7 

induces  the  same  a*  values.  Because  each  f  ,  has  expectation  zero  and  is  bounded  by  1  in  magnitude,  it 
is  0-subgaussian  with  parameter  y/2ir  (see  Section  2.3 1,  and  hence  so  is  the  entire  vector  of  fi  values. 
By  Corollary  |2.3|  (applied  with  just  one  random  vector),  we  conclude  that  f  is  0-subgaussian  with 
parameter  \/2vr  •  si(B). 


•  In  some  settings  we  can  use  a  deterministic  version  of  the  above  method,  where  we  instead  compute 
coefficients  ai  E  [—1/2, 1/2)  and  simply  output  f  =  a,h,  .  When,  for  example,  x  comes  from  a 
sufficiently  wide  continuous  Gaussian,  this  method  yields  y  =  x  +  f  having  a  (very  slightly)  better 
subgaussian  parameter  than  the  randomized  method.  However,  the  analysis  is  a  bit  more  involved,  and 
we  omit  it. 


If  x  has  a  continuous  or  discrete  Gaussian  distribution,  then  using  more  sophisticated  rounding  methods 
it  is  possible  to  make  y  also  be  distributed  according  to  a  true  discrete  Gaussian  (of  some  particular 
covariance),  which  is  needed  in  some  applications  (though  not  any  we  develop  in  this  paper).  By  llPeilOl 
Theorem  3.1],  under  mild  conditions  it  suffices  for  f  to  be  distributed  as  a  discrete  Gaussian  over  A  +  c7, 
and  the  covariance  parameter  of  y  will  be  the  sum  of  those  of  x  and  f.  Using  the  algorithm  from 
Lemma  2.9  we  can  sample  a  discrete  Gaussian  f  with  parameter  bounded  by  maxj  ||bj  ||  •  cc(\/log  n). 
Alternatively,  a  simpler  and  more  efficient  randomized  round-off  algorithm  obtains  a  parameter  bounded 
by  si  (5)  •  to (\/ log  n)  II Pei  1  Oil.  Both  of  these  methods  are  easily  seen  to  be  valid,  though  note  that  they 
yield  slightly  worse  Gaussian  parameters  than  the  two  simpler  methods  described  above. 
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2.5  Algebraic  Number  Theory  Background 

Algebraic  number  theory  is  the  study  of  number  fields.  Here  we  review  the  necessary  background,  specialized 
to  the  case  of  cyclotomic  number  fields,  which  are  the  only  kind  we  use  in  this  work.  More  background  and 
complete  proofs  can  be  found  in  any  introductory  book  on  the  subject,  e.g.,  ISte0411Lan94l.  and  especially 
the  latter  reference  for  material  related  to  the  tensorial  decomposition. 

2.5.1  Cyclotomic  Number  Fields  and  Polynomials 

For  a  positive  integer  m,  the  mth  cyclotomic  number  field  is  a  field  extension  K  =  Q(Cm)  obtained  by 
adjoining  an  element  C m  of  order  m  (i.e.,  a  primitive  mth  root  of  unity)  to  the  rationals.  (Note  that  we 
view  Cm  as  an  abstract  element,  and  not,  for  example,  as  any  particular  value  in  C.)  The  minimal  polynomial 
of  Cm  is  the  mth  cyclotomic  polynomial 

4>m(X)  =  J]  (X  -  <)  €  Z[X],  (2.4) 

where  uim  £  C  is  any  primitive  mth  root  of  unity  in  C,  e.g.,  um  =  exp  (2x0—  1  /’in).  Therefore, 
there  is  a  natural  isomorphism  between  K  and  Q[X]/(<i>m.(X)),  given  by  Cm  *->•  X.  Since  4>m(X) 
has  degree  n  =  =  <p(m),  we  can  view  K  as  a  vector  space  of  degree  n  over  Q,  which  has 

(C m)j£[n]  =  (1,  C m,  ■  ■  ■ ,  Cm-1)  e  20nl  as  a  basis.  This  is  called  the  power  basis  of  K. 

We  recall  two  useful  facts  about  cyclotomic  polynomials,  which  can  be  verified  by  examining  the  roots  of 
both  sides  of  each  equation. 

Fact  2.11.  For  any  m,  we  have  Xrn  —  1  =  n<t|m  ),  where  d  runs  over  all  the  positive  divisors  ofm. 
In  particular,  <J>p(X)  =  1  +  X  +  X2  +  •  •  •  +  Xp~ 1  for  any  prime  p. 

Fact  2.12.  For  any  m,  we  have  4>m(X)  =  4>rad(m)(Xm//radlm)),  where  recall  that  rad(m)  is  the  product  of 
all  distinct  primes  dividing  m.  In  particular,  ifm  is  a  power  of  a  prime  p,  then  <1*  m  ( X )  =  (l>p(Xrn,/p). 

For  instance,  4>8(X)  =  1  +  X4  and  4>25 (X)  =  1  +  X5  +  X10  +  X15  +  X20. 

For  any  m!  dividing  m,  it  is  often  convenient  to  view  K'  =  Q(Cm')  as  a  subfield  of  K  =  Q(Cm),  by 
identifying  Cm'  with  Cm/m  ■ 

Non-prime-power  cyclotomics.  Not  all  cyclotomic  polynomials  are  “regular”-looking  or  have  0-1  (or 
even  small)  coefficients.  Generally  speaking,  the  irregularity  and  range  of  coefficients  grows  with  the  number 
of  prime  divisors  of  m.  For  example,  'hfCX)  =  X2  —  X  +  1;  T-j.g.yfX)  has  33  monomials  with  coefficients 
—2,  —1,  and  1;  and  4*3.5. 7-11-  13(X)  has  coefficients  of  magnitude  up  to  22.  Fortunately,  the  form  of  4>m (X ) 
for  non-prime -power  m  will  never  be  a  concern  in  this  work,  due  to  an  alternative  way  of  viewing  K  =  Q((m) 
by  reducing  to  the  case  of  prime-power  cyclotomics. 

To  do  this  we  first  need  to  briefly  recall  the  notion  of  a  tensor  product  of  fields.  Let  K,  L  be  two  field 
extensions  of  Q.  Then  the  field  tensor  product  K  ®  h  is  defined  as  the  set  of  all  0-1  incar  combinations  of 
pure  tensors  a  <g)  b  for  a  E  K ,  b  G  L,  where  ®  is  0-bilincar  and  satisfies  the  mixed-product  property,  i.e., 

(«i  <S>  b)  +  (a2  <8>  b)  =  (ai  +  a2)  0  b 
(a  (g)  61)  +  (a  <g>  &2)  =  a  <g>  (hi  +  bf) 

e(a  (8>  b)  =  (ea)  ®  b  =  a  <8>  (eh) 

(ai  (8)  hi)(a2  <8>  h2)  =  (aia2)  <8>  (hih2) 
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for  all  eeQ.  These  properties  define  addition  and  multiplication  in  K  <g)  L,  and  though  the  result  is  not 
always  a  field  (because  it  may  lack  multiplicative  inverses),  it  will  always  be  one  whenever  we  take  the 
tensor  product  of  two  cyclotomic  fields  in  this  work.  It  is  straightforward  to  verify  that  if  A,  B  are  O-bases 
of  K ,  L  respectively,  then  the  Kronecker  product  A®  B  is  a  Q-basis  of  K  <g>  L.  Later  on  we  also  consider 
tensor  products  of  rings,  or  more  generally  of  Z-modules.  These  are  defined  in  the  same  way,  except  that 
they  are  made  up  of  only  the  Z-linear  combinations  of  pure  tensors.  This  always  yields  a  ring  or  Z-module, 
respectively,  with  Z-bases  obtained  by  tensoring  Z-bases  of  the  original  objects. 

A  key  fact  from  algebraic  number  theory  is  the  following. 

Proposition  2.13.  Let  m  have  prime-power  factorization  m  =  nip,  i.e.,  the  mp  are  powers  of  distinct 
primes.  Then  K  =  Q(£ m)  is  isomorphic  to  the  tensor  product  (X),  Kp  of  the  fields  I\i  =  0(Cm,  ),  via  the 
correspondence  \\^  a,/  -H-  (<g )g  af),  where  on  the  left  we  implicitly  embed  each  a.p  E  Kp  into  K. 


2.5.2  Embeddings  and  Geometry 

Here  we  describe  the  embeddings  of  a  cyclotomic  number  field,  which  induce  a  ‘canonical’  geometry  on  it. 

The  m  th  cyclotomic  number  field  K  =  Q(fin)  of  degree  n  =  pirn)  has  exactly  n  ring  homomorphisms 
(embeddings)  at :  K  —>  C  that  fix  every  element  of  Q.  Concretely,  for  each  i  E  Z*,  there  is  an  embedding  cr, 
defined  by  =  cu'rn,  where  E  C  is  some  fixed  primitive  mth  root  of  unity.  Clearly,  the  embeddings 

come  in  pairs  of  complex  conjugates,  i.e.,  a,  =  <rm_j.  The  canonical  embedding  a:  K  —t  Cz™  is  defined  as 


o(a)  =  (o  i(o))ieZ. 


Due  to  the  conjugate  pairs,  a  actually  maps  into  H  C  ,  defined  in  Section  2.2  Note  that  a  is  a  ring 


homomoiphism  from  K  to  H,  where  multiplication  and  addition  in  H  are  both  component-wise. 

By  identifying  K  with  its  canonical  embedding  into  H,  we  endow  K  with  a  canonical  geometry.  Recalling 
that  norms  on  H  are  just  those  induced  from  CZm,  we  see  that  for  any  a  E  K,  the  £2  norm  of  a  is  simply 
1 1 o 1 1 2  =  || cr(a) || 2  =  (SJcr*(a)|2)1//2>  and  the  £00  norm  is  maxj|<7j(a)|.  Because  multiplication  of  embedded 
elements  is  component-wise,  for  any  a,b  E  K  we  have 


a 


IN 


(2.5) 


where  ||  ||  denotes  either  the  £2  or  £oc  norm  (or  indeed,  any  £p  norm).  Thus  the  1^  norm  acts  as  an  “absolute 
value”  for  K  that  bounds  how  much  an  element  expands  any  other  by  multiplication.  For  example,  note  that 
for  any  power  (  of  Qm,  each  crfiC,)  must  be  a  root  of  unity  in  C,  and  hence  ||(j|2  =  y/n  and  HCHoo  =  1. 

The  trace  TV  =  Ay '■  K  — >  0  can  be  defined  as  the  sum  of  the  embeddings:  Tr(a)  =  ofia). 
Clearly,  the  trace  is  Q-l incar:  Tr(a  +  b)  =  Tr(a)  +  TY(6)  and  TV(c  ■  a)  =  c  ■  Tr(a)  for  all  a,  b  E  K  and 
c  E  Q.  Also  notice  that 

Tr (a  -b)  =  ^2  Vi{a)cn(b)  =  (a(a),a(b)), 


so  Tr(a  •  b)  is  a  symmetric  bilinear  form  akin  to  the  inner  product  of  the  embeddings  of  a  and  b.  The  (field) 
norm  N  =  N K/q :  K  — >  Q  can  be  defined  as  the  product  of  all  the  embeddings:  N(a)  =  n<:  ai(a)-  Clearly, 
the  norm  is  multiplicative:  N(a  •  b)  =  N(a)  •  N(6). 

When  taking  K  =  (pi),  Kg  as  in  Proposition  2.13  it  follows  directly  from  the  definitions  that  o  is  the 
tensor  product  of  the  canonical  embeddings  <7^  of  Kf,  i.e., 


ae) 


(2.6) 
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(Here  the  index  set  of  a  is  ]^(  Z* ,  which  corresponds  bijectively  to  Z*7  via  the  Chinese  remainder  theorem.) 
This  decomposition  of  er  in  turn  implies  that  the  trace  decomposes  as 

TrA7Q(<8>r  at)  =  TrKi/q(a£).  (2.7) 

Using  the  canonical  embedding  also  allows  us  to  think  of  the  Gaussian  distribution  Dr  over  H  as  a 
distribution  over  K,  or  more  accurately,  over  the  field  tensor  product  J\r  =  K  ®  R,  which  is  isomorphic  as 
a  real  vector  space  to  H  via  a.  For  our  purposes  it  is  usually  helpful  to  ignore  the  distinction  between  K 
and  A'r,  and  to  approximate  the  latter  by  the  former  using  sufficient  precision. 

2.5.3  Ring  of  Integers  and  Its  Ideals 

Let  R  C  K  denote  the  set  of  all  algebraic  integers  in  a  number  field  K .  This  set  forms  a  ring  (under  the  usual 
addition  and  multiplication  operations  in  I\ ),  called  the  ring  of  integers  of  K.  Note  that  the  trace  and  norm  of 
an  algebraic  integer  are  rational  integers  (i.e.,  in  Z),  so  we  have  the  induced  functions  Tr,  N :  R,  — >•  Z. 

For  the  mth  cyclotomic  number  field  K  =  Q(Qn)  of  degree  n  =  p(m),  the  ring  of  integers  happens  to 
be  R  =  Z[Cm]  =  Zpf]/<I>m(X),  and  hence  has  the  power  basis  {Cm}?G[ni  as  a  Z-basis.  Alternatively — and 
this  is  the  view  we  adopt  throughout  the  paper — we  can  view  I!  A  £x),  Rp  as  a  tensor  product  of  the  rings  of 
integers  lip  in  Kp  =  Q(C me),  where  m  =  Hr  m£  is  the  prime -power  factorization  of  m. 

The  (absolute)  discriminant  A k  of  K  is  a  measure  of  the  geometric  sparsity  of  its  ring  of  integers, 
defined  as  A k  =  det(cr(A))2,  the  squared  determinant  of  the  lattice  <r(if)|^]  The  discriminant  of  the  mth 
cyclotomic  number  field  is 

A«  =  (  <2-6 7 8> 
prime  p\m 

where  the  product  in  the  denominator  runs  over  all  primes  p  dividing  m.  The  above  inequality  is  tight  exactly 
when  m  is  a  power  of  two. 

An  (integral)  ideal  X  C  R  is  a  nontrivial  (i.e.,  I  f  %  and  X  f  {0})  additive  subgroup  that  is  closed  under 
multiplication  by  R,  i.e.,  r  ■  a  E  X  for  any  r  E  R  and  a  E  zQ  A  principal  ideal  X  is  one  that  is  generated 
by  a  single  element,  i.e.,  X  =  itR  for  some  u  E  R  which  is  unique  up  to  multiplication  by  units  in  A;  we 
sometimes  write  X  =  ( u ).  An  ideal  X  always  has  a  Z-basis  of  cardinality  n,  which  is  not  unique;  if  X  =  (u) 
and  B  is  any  Z-basis  of  A,  then  uB  is  a  Z-basis  of  X.  A  fractional  ideal  X  C  K  is  a  set  such  that  dX  C  R  is 
an  integral  ideal  for  some  d  G  R,  and  is  principal  if  it  equals  uR  for  some  u  E  K.  Any  fractional  ideal  X 
embeds  under  a  as  a  lattice  cr(X)  in  H,  which  we  call  an  ideal  lattice.  We  identify  X  with  this  lattice  and 
associate  with  X  all  the  usual  lattice  quantities  (determinant,  minimum  distance,  etc.). 

The  norm  of  an  ideal  X  is  its  index  as  an  additive  subgroup  of  R,  i.e.,  N(Z)  =  \R/X\.  This  notion  of  norm 
generalizes  the  field  norm,  in  that  N((a))  =  |N(a)|  for  any  a  E  R,  and  N (XJ)  =  N(Z)  N(Jr).  The  norm  of 
a  fractional  ideal  Z  is  defined  as  N(Z)  =  N(<7Z) /|N(d)|,  where  d  E  R  is  such  that  dX  C  R.  It  follows  that 
the  determinant  of  an  ideal  lattice  Z  is 

det(<r(Z))  =  N(Z)  •  s/A^.  (2.9) 

The  following  lemma  gives  upper  and  lower  bounds  on  the  minimum  distance  of  an  ideal  lattice.  The 
upper  bound  is  an  immediate  consequence  of  Minkowski’s  first  theorem;  the  lower  bound  follows  from  the 
arithmetic  mean/geometric  mean  inequality,  and  the  fact  that  |N(a)|  >  N(Z)  for  any  nonzero  a  E  Z. 

6Some  texts  define  the  discriminant  as  a  signed  quantity,  but  in  this  work  we  only  care  about  its  magnitude. 

7Some  texts  also  define  the  trivial  set  {0}  as  an  ideal,  but  in  this  work  it  is  more  convenient  to  exclude  it. 
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Lemma  2.14.  For  any  fractional  ideal  X  in  a  number  field  K  of  degree  n, 


V^-N  1/n(l)  <  Ai  (X)  <  v^-N^X)- 

The  sum  X  +  J"  of  two  ideals  is  the  set  of  all  a  +  b  for  a  G  X,  b  E  J ,  and  the  product  ideal  X^7  is  the  set 
of  all  finite  sums  of  terms  ab  for  a  £  X,  b  E  J".  Multiplication  extends  to  fractional  ideals  in  the  obvious  way, 
and  the  set  of  fractional  ideals  forms  a  group  under  multiplication;  in  particular,  every  fractional  ideal  X  has  a 
(multiplicative)  inverse  ideal,  written  X1 . 

Two  ideals  X.  J  C  R  are  coprime  if  X  +  J  =  R.  An  ideal  p  C  I!  is  prime  if  whenever  ab  £  p  for  some 
a,b  E  R,  then  a,  E  p  or  h  E  p  (or  both).  An  ideal  p  is  prime  if  and  only  if  it  is  maximal,  i.e.,  if  the  only  proper 
superideal  of  p  is  R  itself,  which  implies  that  the  quotient  ring  /j’/p  is  a  finite  field.  The  ring  R  has  unique 
factorization  of  ideals,  i.e.,  every  ideal  X  can  be  expressed  uniquely  as  a  product  of  powers  of  prime  ideals. 


2.5.4  Duality 


Here  we  recall  the  notion  of  a  dual  ideal  and  explain  its  close  connection  to  both  the  inverse  ideal  and  the 
dual  lattice.  For  more  details,  see  liCon09l  as  an  accessible  reference. 

For  any  fractional  ideal  X  in  K,  its  dual  is  defined  as 

Iv  =  {a  £  K  :  Tr(aX)  C  Z}. 

It  is  easy  to  verify  that  (Xv)v  =  X,  that  Xv  is  a  fractional  ideal,  and  that  Xv  embeds  under  cr  as  the  (conjugate) 
dual  lattice  of  X,  as  defined  in  Section [X4| 

For  any  Q-basis  B  =  {bj}  of  K,  we  denote  its  dual  basis  by  By  =  {bj  },  which  is  characterized  by 
Tr(6j  •  b^)  =  Sij,  the  Kronecker  delta.  It  is  immediate  that  (B/)v  =  B,  and  if  B  is  a  Z-basis  of  some 
fractional  ideal  X,  then  BJ  is  a  Z-basis  of  its  dual  ideal  Zv .  An  important  fact  is  that  if  a  =  a,j  ■  bj  for 
aj  E  M  is  the  unique  representation  of  a  G  AT  in  basis  B,  then  a}  =  Tr(a  •  bj)  by  linearity  of  trace. 


Suppose  that  K  =  <ff>,  Kg  as  in  Proposition  2.13  Then  by  linearity  and  the  tensorial  decomposition 


of  the  trace  (Equation  (|2.7|>),  taking  the  dual  commutes  with  tensoring,  i.e.,  (<g)e  Bp)v  =  (ft),  B'j  for  any 
O-bases  Bp  of  Kp.  In  particular,  this  implies  that  ((^>i:  Xp)  'J  =  fj),  Tj  for  any  fractional  ideals  Tp  in  I\p. 

Except  in  the  trivial  number  field  K  =  Q,  the  ring  of  integers  R  is  not  self-dual,  nor  are  an  ideal  and 
its  inverse  dual  to  each  other.  However,  an  ideal  and  its  inverse  are  related  by  multiplication  with  the  dual 
ideal  Rf  of  the  ring:  for  any  fractional  ideal  X,  its  dual  is  Xv  =  X-1  •  Rf .  The  factor  Rf  is  often  called  the 
codijferent,  and  its  inverse  (R'Jy  1  the  different,  which  is  in  fact  an  ideal  in  R.  By  Equation  (|2.9[)  and  the  fact 
that  det(cr(i?))  =  det(cr(Av))_1,  we  have 


N(AV)  =  A  01- 


(2.10) 


The  codifferent  Ry  plays  an  important  role  in  ring-LWE  and  its  applications.  The  following  material 
shows  that  R'J  is  a  principal  ideal  with  a  particularly  simple  generator,  and  that  (Rv)~1  C  R  is  an  integral 
ideal.  We  include  proofs  for  completeness.  We  start  with  a  useful  lemma  characterizing  the  traces  of  the 
powers  of  Cm- 

Lemma  2.15.  Let  rn  be  a  power  of  a  prime  p  and  m!  =  m/p,  and  j  be  an  integer.  Then 


Tr  (CL)  = 


(p(p)  ■  rrf  if  j  =  0  mod  m. 

—mf  if  j  =  0  mod  mfj  0  mod  rn 

0  otherwise. 
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Proof.  The  first  case  is  immediate,  since  Cm  =  1-  Otherwise,  let  d  =  gcd (j,m)  and  m  =  m/d,  so 
Tr(Cm)  =  d  ■  TrQ(u)/Q(Ci/d).  Because  j/d  is  coprime  with  in,  the  latter  trace  is  the  sum  of  all  complex 
primitive  fifth  roots  of  unity,  which  is  —  1  when  rh  =  p,  and  0  otherwise.  □ 


Lemma  2.16.  Let  m  be  a  power  of  a  prime  p  and  m'  =  m/p,  and  let  g  =  1  —  Qp  E  R  =  Z[C m\.  Then 
Rf  =  ( g/m );  p/g  E  R;  and  {g)  and  (p')  are  coprime  for  every  prime  integer  p'  p. 


Proof  To  prove  the  first  claim,  we  first  show  that  g/m  E  Rv .  Since  the  power  basis  is  a  Z-basis  of  R,  it  is 
necessary  and  sufficient  to  show  that  Tr(Cm  •  g/m )  =  Tr(Cm  —  Cm™  )/m  is  an  integer  for  every  j  E  [ip(m)\. 


By  Lemma  2.15  it  is  (<p(p)  +  1  )m! /m  =  1  for  j  =  0,  and  0  for  all  other  j.  Now  to  show  that  R:J  =  (g/m), 
it  suffices  to  show  that  N (g/m)  =  N (Rv),  the  latter  of  which  is  pm/P by  Equations  (|2.10[)  and  (|2.8[). 
Now  N(m)  =  m^m\  and  N(1  —  Cp)  =  Nq(£p)/q(1  —  Cp)m^p-  Because  the  roots  of  <J>p(X)  are  exactly  the 
complex  primitive  pth  roots  of  unity,  the  latter  norm  is  exactly  <I>p  ( 1 )  =  p,  as  desired. 


To  prove  that  p/g  E  R,  using  1  +  (p  +  Cp  +  •  •  •  +  Cp  =0  one  maY  verify  that 


p  =  (i-C P){(p-  i)  +  (f»-2)CP  + 


+cr2)- 


To  prove  the  third  claim,  recall  again  that  the  norm  of  (g)  is  a  power  of  p.  Therefore,  the  norm  of 
(g)  +  ip'),  being  a  divisor  of  both  a  power  of  p  and  of  p' ,  must  be  1,  implying  that  (g)  and  (p')  are  coprime.D 


Definition  2.17.  For  R  =  Z[£m],  define  g  =  ]dp(l  —  Cp)  E  R.  where  p  runs  over  all  odd  primes  dividing  m. 
Also  define  t  =  rh/ g  E  R,  where  m  =  m/2  if  m  is  even,  otherwise  rh  =  m. 


Notice  that  m/g  E  R  because  (1  —  £2)  =  2,  so  m/g  =  m/  flp(l  —  Cp)  £  Ri  where  here  p  runs  over  all 
primes  dividing  m. 

Corollary  2.18.  Adopt  the  notation  from  Definition\2.17\  Then  Ry  =  (g/m)  =  (P  1 ),  and  (g)  is  coprime 
with  (p')  for  every  prime  integer  p'  except  those  odd  primes  dividing  m. 


Proof  Letting  m  =  Hi  nit  be  the  prime -power  factorization  of  m,  where  each  nip  is  a  power  of  some 
prime  pp,  and  using  the  ring  isomorphism  R  =  (^),,  Rp  where  Rp  =  Z  [Qmf  ] ,  we  can  equivalently  express  g  as 
g  =  (m/m)(<S>£  gf),  where  gp  =  (1  —  (pe).  Then  by  Lemma  2.16[ 


<S>pRt)  =  &)ARe)  =  <S)p(se/mi)Rp  =  (g/m)  •  (0  R, 


as  desired. 

For  the  coprimality  claim,  the  norm  of  g  is  a  product  of  powers  of  the  odd  primes  dividing  m,  and  the 
claim  follows  by  the  same  reasoning  as  in  Lemma  2. 16  □ 


2.5.5  Prime  Splitting  and  Chinese  Remainder  Theorem 

For  an  integer  prime  p  E  Z,  the  factorization  of  the  principal  ideal  (p)  C  R  =  Z[£m]  is  as  follows.  Let  d  >  0 
be  the  largest  integer  such  that  pd  divides  m,  let  h  =  ip(pd),  and  let  /  >  1  be  the  multiplicative  order  of  p 
modulo  m/pd.  Then  (p)  =  ■  ■  ■  p1/,  where  g  =  n/ ( hf )  and  the  p,  are  distinct  prime  ideals  each  of  norm  pi . 

A  particular  case  of  interest  for  us  is  the  factorization  of  an  integer  prime  q  =  1  mod  m,  and  the  form 
of  its  prime  ideal  factors.  Here  the  order  of  q  modulo  m  is  1,  and  so  (q)  “splits  completely”  into  n  distinct 
prime  ideals  of  norm  q.  Notice  that  the  field  Zf;  has  a  primitive  root  of  unity  ujm,  because  the  multiplicative 
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group  of  Zq  is  cyclic  with  order  q  —  1.  Indeed,  there  are  n  =  p(m )  distinct  such  roots  of  unity  cof  £  Zq,  for 
i  £  Z*„,  and  the  prime  ideal  factors  of  (q)  are  simply  q,  =  (q)  +  (Qin  —  Therefore,  each  quotient  ring 
R/q-i  is  isomorphic  to  the  field  Zf;,  via  the  map  Qrn  i-£  ujlm. 

The  Chinese  Remainder  Theorem  says  that  if  p,;  are  pairwise  coprime  ideals  in  R,  then  the  natural  ring 
homomorphism  from  R/  p,  to  the  product  ring  H  W'VPi)  is  in  fact  an  isomorphism.  To  support  efficient 
operations  in  Rq  =  R/qR,  we  will  use  the  following  special  case,  which  we  use  to  define  a  special  Zq-basis 
of  Rq  (see  Section[5]for  details). 

Lemma  2.19.  Let  q  =  1  mod  m  be  prime,  and  let  ujm  £  Zq  and  ideals  q,  be  as  above.  Then  the  natural 
ring  homomorphism  R/(q)  — >  YlieZ*  (-^/th)  —  is  cm  isomorphism. 


2.6  Ring-LWE 

We  now  provide  the  formal  definition  of  the  ring-LWE  problem  and  describe  the  worst-case  hardness  result 
shown  in  HLPR10I.  We  remark  that  our  definition  here  differs  very  slightly  from  the  one  used  in  IlLPR  101:  we 
scale  the  b  component  by  a  factor  of  q,  so  that  it  is  an  element  of  K^/qRf  and  not  KpJ  IiJ  as  in  II.PR10I. 
This  is  done  for  convenience  when  later  discretizing  the  b  component,  and  the  two  definitions  are  easily  seen 
to  be  equivalent. 

Definition  2.20  (Ring-LWE  Distribution).  For  a  “secret”  s  £  Rq  (or  just  Il  J)  and  a  distribution  ip 
over  K\j,  a  sample  from  the  r/ng-LWE  distribution  As^  over  Rq  X  (Kf./  q  If')  is  generated  by  choosing 
a  £-  Rq  uniformly  at  random,  choosing  e  £-  ip,  and  outputting  (a.,  b  =  a  ■  s  +  e  mod  qRv). 

Definition  2.21  (Ring-LWE,  Average-Case  Decision).  The  average-case  decision  version  of  the  ring-LWE 
problem,  denoted  /i-DLWEq,;,,  is  to  distinguish  with  non-negligible  advantage  between  independent  samples 
from  As^,  where  s  •£-  Rq  is  uniformly  random,  and  the  same  number  of  uniformly  random  and  independent 
samples  from  Rq  X  (K^/qRw). 

Theorem  2.22.  Let  K  be  the  mth  cyclotomic  number  field  having  dimension  n  =  p(rn)  and  R  =  Ok  be 
its  ring  of  integers.  Let  a  =  a(n )  >  0,  and  let  q  =  q(n)  >  2,  q  =  1  mod  m  be  a  poly  (n) -bounded  prime 
such  that  aq  >  ui(^/log  n).  Then  there  is  a  polynomial-time  quantum  reduction  from  0(y/n/ a) -approximate 
SIVP  (or  SVP)  on  ideal  lattices  in  K  to  the  problem  of  solving  /CDLWEf/,/,  given  only  £  samples,  where  f  is 
the  Gaussian  distribution  D^qfor  £  =  a  ■  (nlj  log(nf'))1/4. 


Note  that  the  above  worst-case  hardness  result  deteriorates  with  the  number  of  samples  t.  Since  most 
applications  only  require  a  small  (or  even  a  constant)  number  of  samples,  this  is  not  a  serious  issue.  In  cases 
where  a  large  number  of  samples  is  needed,  one  can  use  two  alternative  hardness  theorems  proven  in  llLPR  l  01. 
The  first  assumes  hardness  of  the  search  problem  for  spherical  Gaussian  error,  which  as  yet  lacks  a  reduction 
from  a  worst-case  problem.  The  second  is  a  reduction  from  a  worst-case  problem,  and  it  allows  an  arbitrary 
number  of  samples  without  any  deterioration  in  the  approximation  factor;  it  does,  however,  require  the  error 
distribution  to  be  non-spherical  and  chosen  in  a  specific  way,  which  makes  it  somewhat  less  convenient  in 
implementations.  We  refer  to  [LPR101  for  additional  information. 

In  applications  it  is  often  useful  to  work  with  a  version  of  ring-LWE  whose  error  distribution  is  discrete. 
This  leads  naturally  to  a  definition  of  AS)X  for  a  discrete  error  distribution  \  over  RW ,  with  b  being  an  element 
of  .  We  similarly  modify  Definition  2.21  by  letting  i?-DLWEqiX  be  the  problem  of  distinguishing  between 
AStX  and  uniform  samples  from  Rq  x  Ilf  As  we  show  next,  for  a  wide  family  of  discrete  error  distributions, 
the  hardness  of  the  discrete  version  follows  from  that  of  the  continuous  one.  In  more  detail,  the  lemma 
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below  implies  that  if  R-DE\NEq  ^  is  hard  with  some  number  £  of  samples,  then  so  is  A-DLWE,/X  with  the 
same  number  of  samples,  where  the  error  distribution  X  is  [p  ■  V’l  iu-i-p.Rv  for  some  integer  p  coprime  to  q, 
[■]  is  any  valid  discretization  to  (cosets  of)  pRf ,  and  w  is  an  arbitrary  element  in  Rp  that  can  vary  from 
sample  to  sample  (even  adaptively  and  adversarially).  In  particular,  lor  p  =  1  we  get  hardness  with  error 
distribution  [if]Rv. 

Lemma  2.23.  Let  p  and  q  be  positive  coprime  integers,  and  [•]  be  a  valid  discretization  to  (cosets  of)  pRJ . 
There  exists  an  efficient  transformation  that  on  input  w  £  Rp  and  a  pair  in  (a' ,  b ')  £  Rq  x  K^/qRf,  outputs  a 
pair  (a  =  pa’  mod  qR,  b)  £  Rqx  Rq  with  the  following  guarantees:  if  the  input  pair  is  uniformly  distributed 
then  so  is  the  output  pair;  and  if  the  input  pair  is  distributed  according  to  the  ring-LWE  distribution  As^for 
some  ( unknown )  s  £  R'  and  distribution  if  over  Ar,  then  the  output  pair  is  distributed  according  to  ASjX, 
where  X  =  [p  ■  ^\w+pRy 

Proof  Given  w  and  a  sample  ( a',b ')  £  Rq  x  K^/qRf ,  the  transformation  discretizes  pb'  £  KR/pqRv 
to  [pb']w+pRy  £  (w  +  pRv)  +  pqRv.  It  then  lets  a  =  pa'  mod  qR  and  b  =  [pb' ]w+pRy  mod  qRv ,  and 
outputs  the  sample  (o,  b)  £  Rq  x  Rq. 

If  the  distribution  of  ( a',b ')  is  As^,  then  pb’  =  (pa')  ■  s  +  pc'  mod  pqRw  for  e'  <r-  if.  Because 
(pa')  ■  s  £  p.Rv  /pqRy ,  by  validity  of  the  discretization  we  have  that  [pb']w+pRy  and  (pa')  ■  s  +  [pe']w+pRv 
are  identically  distributed.  Because  p  and  q  are  coprime,  a  =  pa'  mod  qR  is  uniformly  random  over  Rq,  so 
(a,  b)  has  distribution  As>x. 

On  the  other  hand,  if  (o',  b')  is  uniformly  random,  then  a  is  uniform  over  Rq.  Moreover,  since  the 
uniform  distribution  over  K^/pqRw  is  invariant  under  shifts  by  pRv ,  then  by  validity  so  is  the  distribution  of 
b  =  [pb']  w+pRv  mod  qRv,  for  any  w  £  RiJ .  Then  because  p  and  q  are  coprime,  b  is  uniformly  random  over 
Rq  and  independent  of  a,  as  desired.  □ 


Finally,  another  important  variant  of  ring-LWE,  known  as  the  “normal  form,”  is  the  one  in  which  the 
secret,  instead  of  being  uniformly  distributed,  is  chosen  from  the  error  distribution  (discretized  to  Iff  or 
a  coset  of  pRf  as  in  Lemma  2.23  above).  This  modification  makes  the  secret  short,  which  is  very  useful 
in  some  applications.  We  now  show  that  this  variant  of  ring-LWE  is  as  hard  as  the  original  one,  closely 
following  the  technique  of  [ACPS091 . 


Lemma  2.24.  Let  p  and  q  be  positive  coprime  integers,  \  ■)  be  a  valid  discretization  to  ( cosets  of)  pR  J , 
and  w  be  an  arbitrary  element  in  Rp.  If  R-DE\NEq^  is  hard  given  some  number  l  of  samples,  then  so  is  the 
variant  of  A-DLWEg^  in  which  the  secret  is  sampled  from  X  ■=  \jp  ■  f’]w+pRv,  given  £  —  1  samples. 


Proof.  We  show  how  to  solve  the  former  problem  given  an  oracle  for  the  latter.  Start  by  drawing  one  sample 


from  the  unknown  distribution  and  apply  the  transformation  from  Lemma  2.23  (with  p,  w,  and  [•])  to  it. 


Let  (ao,  bo)  £  Rq  x  Rq  be  the  result.  If  oq  is  not  in  R*,  abort  and  reject.  Otherwise,  let  a. 


-l 


£  R*  denote 


its  inverse.  Draw  £  —  1  additional  samples  (a*,  bf)  £  Ilq  x  K-a/qRv  (i  =  1,  ...,£—  1)  from  the  unknown 
distribution,  and  return  the  oracle’s  output  when  applied  to  the  pairs 

(a'i  =  -af 1ai  ,  b'i  =  bi  +  a'60)  £  Rq  x  it*/ qRf . 


To  prove  this  gives  a  valid  distinguishes  notice  first  that  by  Claim  2.25  below,  it  suffices  to  show  a 
noticeable  distinguishing  gap  conditioned  on  a o  being  invertible.  Next,  observe  that  if  the  input  distribution 
is  uniform,  then  so  is  the  distribution  of  the  pairs  (afb'f).  Linally,  if  the  input  distribution  is  As^  for 
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some  s  6  i?v,  then  we  have  bo  =  ao  •  s  +  eo  where  eo  is  distributed  according  to  Therefore,  for  each 
i  =  1, . . . ,  ^  —  1, 

=  (dj  •  s  +  ef)  —  «q  1di(do  '  s  +  e0)  =  ei  +  c^eo, 

where  the  e,  are  distributed  according  to  U,  and  so  the  input  to  the  oracle  consists  of  independent  samples 
from  Aeo as  required.  □ 


Claim  2.25.  Consider  the  mth  cy  clot omic  field  of  degree  n  =  ip{m)for  some  m  >  2.  Then  for  any  q  >  2, 
the  fraction  of  invertible  elements  in  Rq  is  at  least  1/  poly(n,  log  q ). 


When  q  =  1  mod  m  is  a  prime  (as  in  Theorem  |2. 22 1,  we  have  by  Lemma  2.19  that  the  fraction  of  invertible 


elements  in  Rq  is  (1  —  1  /q)n  >  (1  —  1  /(n  +  l))n  >  e  1.  This  uses  the  inequality  1  —  1  / (or  +  1)  >  e  1//q 
for  a  >  0,  which  we  will  use  again  in  the  proof  below. 

Proof  We  first  observe  that  for  any  integer  r  >  1  and  prime  ideal  p,  an  element  a  £  /(is  invertible  modulo  pr 
if  and  only  if  a  f  0  mod  p,  and  therefore  the  fraction  of  uninvertible  elements  in  R/pr  is  1/N(p).  One 
direction  is  obvious:  if  a  =  0  mod  p,  then  so  is  a  ■  b  for  any  b  £  R,  so  a  is  uninvertible  (because  1  0  p).  For 
the  other  direction,  if  a  /  0  mod  p,  then  p  \  (a),  and  so  (a),  pr  are  coprime,  i.e.,  (a)  +  pr  =  R.  Therefore, 
there  exists  b  £  R  such  that  ab  £  1  +  pr. 


Using  the  factorization  of  the  ideals  (p)  given  in  Section  2.5.5  and  the  Chinese  remainder  theorem,  we 
get  that  the  fraction  of  invertible  elements  in  Rq  is 


n  v-p 

prime  p\q 


-fP\n/(fp‘p(pdp)) 


> 


II  (1  -  p-fp)n/v(Pdp\ 

prime  p\q 


(2.11) 


where  dv  is  the  largest  integer  such  that  pdp  divides  rri  and  fp  is  the  multiplicative  order  of  p  modulo  m/pdp. 
For  any  prime  p  we  clearly  have  //'*  >  m  >  m/pdp,  and  therefore 

(1  _  p-fp')n/‘f(Pdp )  _  (p  _  p-fp')V’(m/Pdp) 

>  (1  -p~fp)m/Pdp  >  e_1. 


As  a  result,  the  product  in  (|2.1 1[),  restricted  to  primes  p  dividing  m,  of  which  there  are  at  most  log2  m,  is  at 
least  1/  poly(m).  It  therefore  suffices  to  bound  from  below  the  product  in  (|2. 1  1  [)  restricted  to  primes  p  not 
dividing  rn.  For  such  primes  p  we  have  dv  =  0,  and  the  expression  simplifies  to 

n  (i  -P-fpT,  (2.i2) 

p\q,p\m 


where  fp  is  the  multiplicative  order  of  p  modulo  rn.  Notice  that  the  values  p^p  are  distinct  for  distinct  p. 
Moreover,  they  are  all  1  modulo  m.  Therefore,  since  the  product  in  (|2. 1 2[)  includes  at  most  log2  q  terms,  we 
can  bound  it  from  below  by 


l°g2  1 

n 

k=  1 


1  - 


1 


km  +  1 


l°g2  9 


l°g2  9 


l°g2  1 


>  l\e~n/km>  He-l/k>e-1l[(l--)=  (e-log 2q)~\ 


k= 1 


k= 1 


k= 2 


1 


□ 
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3  Sparse  Decompositions  of  DFT  and  CRT 


Here  we  give  structured  (or  “sparse”)  decompositions  of  two  important  linear  transformations,  which  lead  to 
fast  algorithms  for  applying  them.  We  follow  the  algebraic  framework  of  [PM081. 

Definition  3.1.  Let  mbe  a  prime  power  and  let  TZ  denote  any  commutative  ring  containing  some  element  c om 
of  multiplicative  order  m,  i.e.,  a  primitive  mth  root  of  unity. 

•  The  discrete  Fourier  transform  DFTm  over  TZ  is  the  Zm-/n>-Z.m  matrix  whose  (i.  j)th  entry  is  cof. 

•  The  Chinese  remainder  transform  CRTm  over  TZ  is  the  (square)  submatrix  of  DFTm  obtained  by 
restricting  to  the  rows  indexed  by  Z*n  and  the  columns  indexed  by  \ip[m)\. 

For  an  arbitrary  positive  integer  m  having  prime-power  factorization  m  =  JT  m£>  where  TZ  has  an  mth  root 
of  unity  (and  hence  has  primitive  rrif  th  roots  of  unity  for  each  mi),  the  DFT  and  CRT  matrices  are 

DFTm  =  DFTmf  and  CRTm  =  CRTm£. 

We  identify  the  matrices  DFTm  and  CRT,,,  with  the  linear  transforms  they  represent. 

For  a  prime  power  m,  applying  DFTm  corresponds  with  evaluating  a  polynomial  in  TZ\X ]  of  degree  less 
than  m  (represented  by  its  vector  of  coefficients  in  the  natural  order)  at  all  the  mth  roots  of  unity  oj’rn  G  TZ 
for  i  G  [rn] .  Similarly,  CRTm  corresponds  with  evaluating  a  polynomial  of  degree  less  than  :p(rn)  at  all 


explains  our  choice  of  the  name  “Chinese  remainder  transform.”) 

For  m  with  prime-power  factorization  m  =  \\f  mi,  it  can  be  shown  using  the  Good- Thomas  decompo¬ 
sition  that  DFTm  again  corresponds  with  polynomial  evaluation  at  all  mth  roots  of  unity,  but  under  some 
permutations  of  the  input  and  output  vectors.  For  CRTm,  the  correspondence  with  polynomial  evaluation 
is  different,  because  the  columns  of  CRTm  typically  do  not  correspond  to  powers  0, . . . ,  p(rri)  —  1  of  a 
primitive  mth  root  of  unity  uirn .  Instead,  CRTm  corresponds  with  evaluation  of  a  multivariate  polynomial 
(with  one  variable  per  factor  mi)  at  all  input  tuples  in  which  the  /th  element  is  a  primitive  m/th  root  of  unity. 
We  adopt  the  tensorial  form  of  CRTm  because  it  corresponds  directly  with  the  tensorial  (or  multivariate) 
decomposition  of  the  mth  cyclotomic  number  field,  and  admits  a  finer-grained  decomposition  and  more 
efficient  algorithms  than  the  univariate  perspective. 

Decomposition  of  DFTm.  Let  m  be  a  power  of  some  prime  p,  and  let  m'  =  m/p.  Using  the  Cooley-Tukey 
decomposition  we  can  express  DFTm  in  terms  of  smaller  DFTs  of  dimensions  p  and  rn' .  and  by  iterating, 
in  terms  of  I)FT;,  alone.  Reindex  the  columns  of  DFTm  by  pairs  (jo,ji)  G  [p]  x  [rn/] ,  using  the  standard 
correspondence  j  =  m/jo  +  j t  G  [m] .  Similarly,  reindex  the  rows  by  pairs  (io,  i \ )  G  [p]  x  [rn/] ,  this  time 
using  the  (nonstandard)  correspondence  i  =  pi\  +  iq  G  [m]  j^]  We  then  have  the  decomposition 

DFTm  =  (/[p]  (g)  DFTm/)  ■  Tm  ■  (DFTp  <g)  J[m/]),  (3.1) 

where  all  three  terms  are  ([p]  x  [m'])-by-([p]  x  \m'\)  matrices,  and  Tm  is  the  diagonal  “twiddle”  matrix  having 
entry  uj/f 1  in  its  (fo,  i\  )th  diagonal  entry.  Therefore,  applying  DFTm  reduces  to  m'  parallel  applications 
of  DFTp,  followed  by  m  parallel  scalar  multiplications  by  twiddle  factors,  followed  by  p  parallel  applications 

sThis  relabeling  corresponds  with  the  “bit-reversal”  or  related  “stride”  output  permutation  in  the  standard  decimation-in-frequency 
FFT  algorithm.  In  an  implementation,  the  permutation  can  be  omitted  because  the  output  does  not  need  to  be  in  any  particular  order. 


the  primitive  mth  roots  of  unity  ujlm  for  i  G  Z*  ,  .  (This  interpretation,  and  its  connection  with  Lemma  2.19 
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of  DFTm/.  Of  course,  each  DFTm/  can  be  further  decomposed  in  the  same  way,  down  to  the  DFTp  base 
case.  Using  any  of  the  Rader,  Winograd,  or  Bluestein  FFT  algorithms,  we  can  apply  such  base  cases  in 
0(p  logp)  time,  which  implies  that  DFTm  can  be  applied  in  0(n  log  n)  time,  where  n  =  <p(m). 

To  verify  Equation  (|3.1[),  it  suffices  by  lineality  to  compare  the  action  of  both  sides  on  the  standard  basis 
vectors.  Take  any  (jo,ji)  G  [p]  x  [m!]  and  consider  the  vector  with  1  in  location  and  zero  elsewhere. 

Applying  DFTp  <8>  /-m/|  to  it  yields  the  vector  that  is  ulpJO  in  locations  (?'q,  j\  )  for  Iq  G  [p]  and  zero  elsewhere. 
The  matrix  Tm  changes  these  nonzero  entries  to  LOp  3,1  co  in  1 ,  and  finally,  Iipj  <g>  DFTm/  yields  the  vector  with 

,M3  0  .  ,,Mh  .  ,,,*1.71  _  ,  ,m'i0j0+i0ji+phji  .,(pil+io)(.m'j0+ji) 

in  any  location  (zq,  i\)  G  [p]  x  [m'],  as  required. 


Decomposition  of  CRTm.  Letting  m,  p,  and  m!  be  as  above,  notice  that  <p(m)  =  pip)  ■  ml .  Moreover,  with 


is  the  submatrix  of  DFT, 


restricted  to  rows  Z*  x 


m 


the  above  reindexing  of  rows  and  columns,  CRT, 

and  columns  [p{p)\  x  [m'}.  By  appropriately  restricting  the  matrices  in  Equation  (|3. 1  [),  we  obtain  the 
decomposition  (which  can  be  verified  in  the  same  way  as  above) 


CRTm  —  (I%*  <g>  DFTm/)  •  fm  ■  (CRTp  <g)  /[„,/]), 


(3.2) 


where  Tm  is  the  diagonal  twiddle  matrix  Tm  from  above,  restricted  to  the  rows  and  columns  indexed  by 
Z*  x  [in'] .  Applying  CRTm  therefore  reduces  to  m'  parallel  applications  of  CRTp,  followed  by  p(rn) 
parallel  scalar  multiplications  by  twiddle  factors,  followed  by  ip(p)  parallel  applications  of  DFTm/. 


Inversion.  Using  the  inversion  rules  for  matrix  multiplication  and  the  Kronecker  product,  the  inverse  DFT 
and  CRT  decompose  as 


DFTm1  =  (DFTp  1  <g>  I[m/])  •  Tml  ■  (1^  (g)  DFT,J)  (3.3) 

CRT"1  =  (CRT;1  (g)  IK])  •  ■  (/z.  ®  DFT;}),  (3.4) 

and  can  be  applied  at  exactly  the  same  cost  as  their  forward  counterparts.  Note  that  the  row  and  column  index 
sets  of  CRTm  are  different  (as  they  are  for  CRTp  and  Tm  as  well),  so  CRT;1  •  CRTm  and  CRTm  •  CRT;1 
are  “different”  matrices,  although  they  are  both  still  identity  matrices  over  the  appropriate  index  sets. 


Arbitrary  m.  For  m  that  may  have  more  than  one  prime  divisor,  the  tensorial  form  of  CRTm  leads 
immediately  to  a  fast  algorithm.  Specifically,  if  m  has  prime-power  factorization  m  =  ]~[(,  rni,  then  by 
the  mixed-product  property,  applying  CRTm  =  CRTm<  reduces  to  p(nif  rn()  parallel  applications 
of  CRTm{,  in  sequence  for  each  t  (see  the  end  of  Section  2.1 1.  Since  each  CRTmf  can  be  applied  in 


0(me  log  rrif  )  time  and  0(log  me)  parallel  depth,  the  total  runtime  and  parallel  depth  are  0{m  log  m)  and 
0(log  m),  respectively. 


4  The  Powerful  Basis 

In  this  section  (and  Section[6j)  we  study  certain  Z-bases  of  certain  fractional  ideals  X  in  K  =  Q(;n),  which 
are  therefore  Z, phases  of  the  quotients  Xq  =  X/qX  for  any  positive  integer  q.  Fixing  such  a  basis  b  and 
viewing  it  as  a  (column)  vector  over  X,  we  can  represent  any  a  G  X  (respectively,  a  G  Xq)  uniquely  as 
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a  =  (b,  a)  =  bT  •  a  for  some  coefficient  vector  a  over  Z  (respectively,  Zg)  having  the  same  index  set  as 
b.  Our  algorithms  simply  store  and  operate  on  these  coefficient  vectors,  while  also  keeping  track  of  the 
corresponding  basis,  which  will  be  one  of  the  few  we  define  below.  Notice  that  by  linearity,  if  we  have  some 
a  E  1  represented  by  coefficient  vector  a  in  basis  b,  then  a  is  also  the  representation  of  ra  E  rl  in  the  basis 
rb  for  any  r  E  K ,  so  we  can  switch  between  the  two  values  at  essentially  no  cost. 

Here  we  define  a  certain  useful  Z-basis  of  R,  and  hence  Q-basis  of  K.  We  call  it  the  “powerful”  basis, 
due  to  its  decomposition  in  terms  of  the  power  bases  of  prime -power  cyclotomics,  and  the  fast  algorithms 
associated  with  it@ 

Definition  4.1.  The  powerful  basis  p  of  K  =  Q(Qm.)  and  R  =  Z[£m]  is  defined  as  follows: 

•  For  a  prime  power  m,  define  p  to  be  the  power  basis  ( Crn)je[ip(m)\>  treated  as  a  vector  over  R  C  K. 

•  For  m  having  prime-power  factorization  m  =  \\f  mg,  define  p  =  Yf),  pi,  the  tensor  product  of  the 
power(ful)  bases  pi  of  each  K g  =  Q(CmJ- 

For  any  power  T  =  (Rf)k  of  Rf  =  define  the  powerful  basis  ofT  to  be  t~k  ■  p. 


By  definition  of  the  tensor  product,  p  is  a  vector  with  index  set  (p(nig)\.  So  to  specify  an  entry  of  p 
we  need  one  index  jg  E  [p(nig)\  per  prime  divisor  of  m,  and  the  specified  entry  is  P(]t)  =  Yig  C mv  Note 
that  because  Qrn  f  =  (m/me  E  K,  it  is  possible  to  “flatten”  the  index  set  to  a  siz e-tp(m)  subset  of  [m],  where 
index  tuple  (jg)  maps  to  j  =  'f2((m/me)je  mod  m,  and  p}  =  Cfrn.  We  note  that  unless  m  is  a  prime  power, 
the  flattened  index  set  is  not  equal  to  [<p(m)],  so  the  powerful  basis  differs  from  the  power  basis,  although 
it  still  consists  of  powers  of  (m.  For  instance,  for  m  =  15  and  (  =  (15,  the  powerful  basis  consists  of 
C°)  C3)  C5,C6,  C8;  C9 *)  C11 * *.  an(^  C14-  Because  the  flattened  indices  tend  to  be  a  somewhat  irregular  subset  of  [m], 
it  is  usually  preferable  to  maintain  the  structured  index  set. 

Observe  that  pT  is  a  row  vector  (over  K)  with  columns  indexed  by  H,  [p(rri()].  Applying  the  canonical 
embedding  a  entry-wise  to  obtain  column  vectors  indexed  by  7L*m  (or  equivalently,  n  by  Equation  (|2.6[) 

we  obtain  the  complex  matrix  cr(pT)  =  CRTm.  With  this  fact  in  mind,  we  now  prove  two  basic  facts  about 


the  geometry  of  the  powerful  basis.  The  first  says  that  all  its  elements  are  short  (and  in  fact,  by  Lemma  2.14 


they  are  shortest  nonzero  elements  of  R),  and  the  second  statement  says  essentially  that  the  elements  are 
close  to  orthogonal. 


Claim  4.2.  The  length  of  each  element  pj  ofp  in  norm  is  \  P  j  \  \  x  =  1,  and  in  1 2  norm  is  \  p  j  \  \  2 
y/ip(m)  =  y/n. 


Proof  Each  entry  in  the  CRTm  matrix  is  a  root  of  unity,  hence  it  has  magnitude  1,  and  so  the  la 0  and  £2 
norms  of  each  column  are  1  and  \J <p(m),  respectively.  □ 

Lemma  4.3.  The  largest  singular  value  of  a(pT)  (or  equivalently,  of  CRT,rJ  is  s  \  (p)  =  \/rh,  and  the 
smallest  singular  value  is  sn (p)  =  y/m/rad(m). 

Notice  that  the  ratio  of  si(p)  to  \J  <p(m)  (i.e.,  the  1 2  norm  of  each  basis  element)  is  just  yj  m/ip(m)  = 
(Tip  !>/ ( P  ~  l))1^2  =  logm),  where  the  product  runs  over  all  odd  primes  dividing  rn. 

9Although  we  define  the  powerful  basis  in  a  different  way,  it  can  be  seen  that  it  coincides  with  what  Bosma  IBos  90l  calls  the 

“canonical”  basis  of  R.  Bosnia’s  work  is  the  only  one  we  know  of  that  explicitly  considers  this  basis. 
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Proof.  It  suffices  to  prove  the  statement  when  m  is  a  prime  power,  due  to  the  tensor  structure  of  CRTm,  and 
the  fact  that  the  vector  of  singular  values  of  A  (g>  B  is  the  tensor  product  of  the  two  vectors  of  singular  values 


of  A  and  B.  So  let  m  be  a  power  of  a  prime  p,  and  let  m'  =  m/p.  By  Equation  (3.2 1 


CRTm  =  (yfriQ)  •  (CRTp  ®  IK]) 


for  some  unitary  matrix  Q,  because  DFT,,,/ /  \Jm!  is  unitary  for  any  m' ,  and  so  is  the  twiddle  matrix  Tm.  The 
lemma  then  follows  immediately  from  the  fact  that  the  ip(p)  =  p  —  1  eigenvalues  of  the  Gram  matrix 


CRT;  •  CRTp  =  {pi[Ap)]  -  1 


(4.1) 


are  p, . . . ,  p  (p  —  2  times)  and  1,  where  the  asterisk  denotes  the  conjugate  transpose,  1  e  RITA  js  the  all-ones 
vector,  and  the  equality  is  by  the  fact  that  CRTP  is  obtained  by  removing  the  all-ls  row  and  one  column  from 
DFTp,  which  is  a  unitary  matrix  scaled  up  by  a  yfp  factor.  IT 


We  conclude  this  section  by  characterizing  the  Gram-Schmidt  orthogonalization  CRTm  of  the  powerful 
basis  (under  the  canonical  embedding),  in  Lemma [4~4| below.  This  orthogonalization  is  used  in  the  nearest- 
plane  BBab85l  and  Klein/GPV  IIGPV08I  algorithms  (see  Lemma  [T9]),  which  we  use  for  sampling  from  discrete 
Gaussians  over  R.  The  lemma  implies  that  the  orthogonalization  is  structured  so  that  these  algorithms  can  be 
executed  in  substantially  less  time,  and  using  much  less  precision,  than  is  required  for  an  arbitrary  basis.  This 
is  because  the  U  matrix  associated  with  the  orthogonalization  is  block  diagonal  with  m /  rad(m)  identical 
square  blocks  of  dimension  rad(m),  which  allows  an  implementation  to  make  mj  rad(m)  parallel  and 
independent  calls  to  a  quadratic-time  subroutine  on  dimension  rad(m),  for  0(mrad(m))  scalar  operations 
in  total.  Moreover,  each  row  of  U  has  a  small  (common)  denominator,  allowing  an  implementation  to  compute 
inner  products  with  the  rows  of  U  using  low-precision  integers  (see  the  discussion  following  Lemma  [T9]). 

Recalling  from  Section  2.2  the  matrix  form  of  the  Gram-Schmidt  orthogonalization,  it  follows  by  the 
mixed-product  property  that  A  ®  B  =  A  <g>  B.  By  the  tensor  structure  of  CRTm,  it  therefore  suffices  to 
consider  the  case  where  m  is  a  prime  power. 

Lemma  4.4.  Let  m  be  a  power  of  a  prime  p  and  m'  =  m/p.  Then 


CRTm  —  Qr 


C[ro'])  ‘  {Up  <8)  I[m.'])i 


where  Qm  is  unitary,  Dp  is  the  real  diagonal  [ip(p)\-by-\ip(p)\  matrix  with  y7  (p  —  1)  —  j/(p  —  j)  in  its  jth 
diagonal  entry,  and  Up  is  the  upper  unitriangular  [<p(p)\-by-[p(p)\  matrix  with  —l /{p  —  i  —  1)  in  its  {i,  j)th 
entry,  for  0  <  i  <  j  <  p{p)- 


Proof.  By  Equation  (3.2)  and  the  fact  that  Tm  and  DFTm/ /\fm!  are  unitary  matrices,  we  have 


CRTm  =  sfrtQ'  •  (CRTp  ®  IM) 

for  some  unitary  (/.  Thus  it  suffices  to  show  that  CRTp  =  Qp  ■  Dp  ■  Up  for  some  unitary  Qp. 

Let  G  =  CRT*  •  CRTp  be  the  Gram  matrix  of  CRTP  and  recall  from  Equation  (|4. 1  [)  that  G  has  diagonal 


entries  p  —  1,  and  —1  entries  elsewhere.  As  discussed  in  Section  |2.2[  by  the  uniqueness  of  the  Cholesky 
decomposition  it  suffices  to  show  that 


G  =  t/pT  •  Dl 


U, 


v 


26 


Approved  for  Public  Release;  Distribution  Unlimited. 
218 


This  equality  can  be  verified  by  an  elementary  calculation,  as  follows.  For  k  >  1,  define 

T{k)  :=  F2  +  2^3  +  "  '  +  (k-\)-k- 

It  is  easy  to  see  (by  induction,  or  by  noticing  that  adding  l/k  to  the  above  collapses  the  sum)  that  T(k)  = 
1  —  1  /k  .  For  any  i  E  the  / 1 h  diagonal  entry  in  llj  ■  Dp  ■  Up  is 


p  —  1  — 


i— i  i 

l  V  1 

—  i  <  (n  —  h  - 


P~i  ^(p-fc-l)2 


p  —  1  — 


p  —  k 


The  summation  in  the  above  expression  is 

2—1 


P 


£ 

k= 0 


1 


( p  —  k){p  —  k  —  1) 


=  p(T(p)  -  T{p  -  »))  =  p{}  ~  p  ~  1  +  JTi) 


p  —  i 


and  so  the  ith  diagonal  entry  is  p  —  1,  as  required.  The  off-diagonal  entries  are  calculated  in  essentially  the 
same  way.  □ 


5  The  Chinese  Remainder  Basis  and  Fast  Ring  Operations 

When  working  in  K  or  R,  we  can  perform  ring  operations  efficiently  by  representing  elements  under  the 
canonical  embedding  a.  Recall  that  a  is  the  ring  embedding  from  K  =  O(Cm)  into  the  product  ring 
H  C  CZm  that  maps  Qrn  to  each  power  ulm  E  C  for  i  E  Z*.(,  where  ujm  is  a  primitive  complex  mth  root  of 
unity.  Under  the  canonical  embedding,  addition  and  multiplication  simply  apply  coordinate- wise  on  each 
complex  coordinate.  Converting  to  the  embedding  representation  from  the  powerful  basis  p  is  done  simply  by 
multiplying  (with  sufficient  precision)  by  the  complex  matrix  CRT,,,  =  a(pT ),  i.e.,  if  a  =  (p,  a)  E  K  for 
some  rational  vector  a  then  a  (a)  =  CRTm  •  a. 

In  ring-LWE  and  its  applications,  we  often  work  in  Rq  and  R^,  and  sometimes  in  Zq  for  1  =  (Ry  )k, 
where  q  is  a  prime  integer  congruent  to  1  modulo  m{^]  While  using  the  canonical  embedding  as  above  lets  us 
perform  ring  operations  relatively  efficiently  in  these  quotients  (by  using  an  arbitrary  set  of  representatives), 
here  we  describe  more  efficient  and  practical  algorithms  that  only  use  arithmetic  in  7Lq,  rather  than  on 
high-precision  complex  numbers.  These  algorithms  are  facilitated  by  what  we  call  the  Chinese  remainder 
(CRT)  basis  for  Zq,  defined  next. 

Recalling  that  R  =  (x),  Rf  where  m  =  rn,f  is  the  prime -power  factorization  of  m  and  Ilf  is  the  m/  th 
cyclotomic  ring,  it  is  easy  to  verify  that  the  quotient  ring  Rq  =  (R),  (Rf/qRi).  Therefore  we  may  focus  on 
the  case  of  prime -power  m.  Also  recall  from  Section  2.5.5  the  prime  ideal  factorization  (q)  =  n,ez*  T 
in  R,  where  q,  =  (q)  +  (£m  —  ujlm)  is  prime  in  R  and  c um  is  some  fixed  element  of  order  m  in  7Lq. 


Definition  5.1.  For  a  positive  integer  m,  the  Chinese  remainder  (or  CRT)  Zjq-basis  c  of  Rq  is  as  follows: 

•  For  a  prime  power  m,  c  =  (ci)iez*,  is  characterized  by  a  =  1  mod  q,  and  a  =  0  mod  q j  for  i  f  j. 


(Its  existence  is  guaranteed  by  Lemma  2.19  the  Chinese  remainder  theorem.) 


10The  modulus  q  may  also  be  a  product  of  several  primes  qt  =  1  mod  m,  in  which  case  we  can  use  the  Chinese  Remainder 
Theorem  to  decompose  Rq  into  the  product  of  rings  Rqi . 
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•  For  m  having  prime-power  factorization  m  =  JT  rrif,  define  c  =  ®i  Cf  ,  the  tensor  product  of  the  CRT 
bases  cl  of  each  lie/ q  Ilf. 

For  any  power  T  =  (Ry)k  of  Rv  =  ( t _1),  the  CRT  Zq-basis  ofZq  is  t~k  ■  c. 

Note  that  c  is  a  vector  over  Rq  having  as  its  index  set  the  Cartesian  product  f[^Z*  ,  which  may  be 
flattened  to  the  set  7L*m  using  the  bijective  correspondence  ( ji )  j  =  /Rf(ni/rnt.)  •  jg  E  Z)r(.  But  for  our 
puiposes  it  is  usually  more  convenient  to  retain  the  structured  index  set. 

Working  in  the  CRT  basis  yields  very  fast  arithmetic  operations.  Suppose  that  m  is  a  prime  power. 
Since  cf  =  c*  E  Rq  and  ct  ■  c,>  =  (I  E  Rq  for  distinct  i .  i'  E  Z*(,  the  CRT  basis  has  the  property  that 
if  a,  b  E  Rq  have  coefficient  vectors  a,  b  (respectively)  over  7Lq  in  the  CRT  basis — i.e.,  a  =  ( c ,  a)  and 
b  =  (c,  b) — then  the  coefficient  vector  of  a  ■  b  G  Ttq  is  the  componentwise  product  a  0  b  over  7Lq.  (Addition 
is  componentwise  as  well,  by  linearity.)  Moreover,  this  extends  immediately  to  powers  of  /? v :  if  a,  b  are  the 
respective  coefficient  vectors  of  a  E  (Rw)kl,  b  E  (Rv)q2  in  the  respective  CRT  bases  t~kl  ■  c  and  t~k2  ■  c, 
then  a  0  b  is  the  coefficient  vector  of  a  ■  b  E  (Rw)q  in  the  CRT  basis  t~k  ■  c,  where  k  =  k\  +  /c2- 

Still  treating  m  as  a  prime  power,  using  the  field  isomorphisms  R/qg  =  Zg  given  by  Qm  utlm,  we  see 
that  the  CRT  basis  c  and  powerful  basis  p  =  {Cm) °f  Rq  at'c  related  by 

pT  =  cT  ■  CRTm,  (5.1) 

where  the  matrix  CRTm  is  over  Tq.  So  if  a  E  Rq  has  coefficient  vector  a  E  in  the  powerful  basis — i.e., 

a  =  (p.  a) — then  its  coefficient  vector  in  the  CRT  basis  is  CRT,?l  •  a  E  Zf/  — i.e.,  a  =  (c,  CRTm  •  a) — and 
similarly  for  Tq  by  linearity.  Using  the  sparse  decomposition  of  CRTm  and  its  inverse  from  Section [3j  we 
can  therefore  switch  efficiently  between  the  power  and  Chinese  remainder  bases. 

Finally,  for  arbitrary  m,  by  the  tensorial  decomposition  of  Rq,  multiplication  is  still  componentwise  in 
the  CRT  basis.  Moreover,  by  the  definitions  of  p,  c,  and  CRTm  as  tensor  products  and  the  mixed-product 
property,  it  immediately  follows  that  Equation  (|5.1[)  holds  as  well. 


6  The  Decoding  Basis  of  Rv 


When  working  with  ring-LWE  we  need  to  perform  a  variety  of  operations  over  If1  =  (t_1)  or  Rq.  For 
certain  operations  it  is  best  to  use  a  certain  Z-basis  of  /iv  (and  Z^-hasis  of  II'/),  defined  below. 

Let  t  be  the  automorphism  (and  involution)  of  IC  that  maps  Cm  to  Cm1  =  Cm _1-  We  refer  to  r 
as  the  conjugation  map,  since  under  the  canonical  embedding  it  corresponds  to  complex  conjugation: 
cr(r(a))  =  a(a).  Notice  that  for  any  m'  dividing  m,  r  also  maps  Cm '  =  Rm  "'  to  Cfp  =  Cm"1'  "'  ■  Also  note 
that  r{p)  is  a  Z-basis  of  R,  since  t  is  an  automorphism  and  hence  fixes  R. 

Definition  6.1.  The  decoding  basis  of  Rf  is  d  =  r(_p)v,  the  dual  of  the  conjugate  of  the  powerful  basis 


The  decoding  basis  therefore  has  the  same  index  set  as  p.  When  m  is  a  prime  power,  d  is  simply  the  dual 
of  the  conjugate  power  basis  r(p)  =  (C m)j£\ip(m)]  °f  R-  For  general  m,  because  r(p)  is  the  tensor  product 


1 1  Note  that  unlike  the  powerful  and  CRT  bases,  we  do  not  define  a  decoding  basis  for  any  other  power  of  f?v ;  see  Section [^2] for 
discussion.  Also,  there  is  some  flexibility  in  the  choice  of  d,  and  other  definitions  may  be  nearly  as  good,  e.g.,  d  =  pv  (without 
conjugation).  We  adopt  the  above  definition  because  it  corresponds  to  the  adjoint  of  a(pT),  and  yields  a  particularly  simple 
connection  between  d  and  the  powerful  basis  f-1p  of  Rv  (see  Lemma  6.3  i. 
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of  the  conjugate  power  bases  for  prime -power  cyclotomics  Re,  and  (a  <g>  b)v  =  (av  ®  hv),  it  follows  that  d  is 
the  tensor  product  of  the  decoding  bases  for  each  Ry. 

We  start  with  some  basic  facts  about  the  decoding  basis.  Any  a  G  AA  can  be  represented  in  the  decoding 
basis  as  a  =  (d,  a)  for  some  vector  a  of  real  coefficients,  given  by 

dj  =  Tr(a  •  d'j)  =  Tr(a  •  r{pj))  =  (cr(a),  cr(pj ))  a  =  CRT^  •  a  (a).  (6.1) 

Since  d  is  the  dual  basis  of  r(p),  which  embeds  as  cr (t(pt))  =  CRTm  over  C,  we  have  that  d  embeds  as 

a(dT)  =  (CRT '*J~\ 


Lemma  4.3  and  the  fact  that  complex  conjugation  leaves  singular  values  unchanged,  implies  the  following 


geometric  fact  about  the  decoding  basis. 

Lemma  6.2.  The  spectral  norm  of  d  is  s\(d)  =  \J rad(m) /rri. 


We  point  out  that  si(d)  can  be  as  large  as  1  (in  the  extreme  case  where  m  is  square  free),  which,  unlike 
for  p  (see  Lemma |4~j]),  is  much  larger  than  the  normalized  determinant  (let ( RJ ) 1 /n  =  AKl^2‘  q)  «  1/Vn. 
Fortunately,  the  decoding  basis  is  still  always  a  good  choice  for  discretizing  a  continuous  ring-LWE  error 
distribution  (while  increasing  the  subgaussian  parameter  only  slightly),  because  the  input  error  distribution 


needs  to  have  Gaussian  parameter  at  least  u(yJ\ogn)  for  provable  worst-case  hardness  (see  Theorem  2.22 1. 
We  also  point  out  that  if  d  were  instead  defined  as  the  dual  of  the  power  basis  (or  its  conjugate),  then  its 
spectral  norm  could  be  much  larger:  e.g.,  for  m  =  1155  =  3  •  5  •  7  •  11  we  would  have  s\(d)  «  22.6. 

In  the  next  few  subsections,  we  prove  several  important  and  useful  properties  of  the  decoding  basis, 
summarized  as  follows: 


There  are  very  fast  linear  transformations  (requiring  0(nd)  scalar  operations  with  small  hidden 
constant,  where  d  is  the  number  of  prime  divisors  of  m)  for  converting  between  the  decoding  basis  d 
and  the  powerful  basis  t~lp  of  Rw  (see  Section  6.1 1. 

Short  elements  (as  always,  in  the  sense  of  the  canonical  embedding)  of  K  have  optimally  small 
coefficients  with  respect  to  d,  making  it  a  best  choice  for  decoding  IVJ .  Moreover,  d  also  yields  (nearly) 
optimal  decoding  in  higher  powers  of  i?v.  (See  Section  6.2  ) 

Continuous  Gaussians  (especially  spherical  ones)  as  represented  in  the  decoding  basis  can  be  sampled 
very  simply  and  efficiently  (see  Section  [63]). 


The  first  fact,  combined  with  the  fast  CRT  transformation,  means  that  we  can  efficiently  convert  among  the 
decoding,  power,  and  CRT  bases  of  Rw  (or  Rfq )  as  needed.  The  latter  two  facts  mean  that  the  decoding 
basis  is  an  excellent  choice  for  generating  and  decoding  error  terms  (e.g.,  in  encryption  and  decryption, 
respectively).  By  contrast,  the  power  basis  and  other  natural  bases  of  R  or  Ry  do  not  typically  enjoy  the 
above  properties  (except  when  m  is  a  power  of  2),  and  while  they  can  in  principle  be  used  for  all  the  same 
tasks,  it  would  come  at  a  potentially  large  loss  in  tightness  and/or  computational  efficiency. 


6.1  Relation  to  the  Powerful  Basis 

Recall  that  both  d  and  t~lp  are  Z-bases  of  ITJ ,  so  there  is  a  unimodular  transformation  that  relates  them, 
which  is  given  in  the  following  lemma. 
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Lemma  6.3.  Let  m  be  a  power  of  a  prime  p  and  let  m!  =  m/p,  so  'p(rn  )  =  <p(p)  ■  rn\  Then 

dT  =  t~lpT  •  ( Lp  <g>  I[mr/),  (6.2) 

where  Lp  G  Z  [v>(p)1  x  is  the  lower-triangular  matrix  with  Is  throughout  its  lower-left  triangle,  i. e. ,  its 
(i,  j)th  entry  is  l  for  i  >  j,  and  0  otherwise. 


Proof  First  reindex  the  conjugate  power  basis  using  index  set  [<p{p)\  x  [ml],  as  =  Cp  30  •  Cm 31 , 

and  reindex  d  similarly.  Equation  (|6.2[)  may  then  be  rewritten  equivalently  as 


du<,M = t-1  ■  (c? + a»+i + ■  ■  ■ + cn  ■  =  k  «?  -  <$-')  ■  a. 


(6.3) 


where  recall  from  Definition  |2. 17  that  t  1  =  (1  —  Qp)/rn.  To  verify  the  above  equation,  observe  that  the 
product  of  the  right-hand  expression  with  T(P(j'Q,j[))  for  any  (Jo •  j\ )  £  [t(p)\  x  [m'\  is 


1  (/-Jo- j0 
m 


/•P-1- Jo 
kp 


)  '  Ci 


■Ji-Ji 

m 


By  Lemma  2.15  the  trace  of  this  is  0  if  j\  f  j\  (because  j\  —  j\  f  0  mod  m')\  otherwise  it  is  0  if  jo  /  j 0 
(because  both  jo  —  j'G,p  —  \  —  j'0  f  0  mod  p):  otherwise,  it  is  1,  as  desired.  □ 


Observe  that  multiplication  by  Lp  can  be  done  in  0(<p(p))  scalar  operations  via  partial  sums,  and  similarly 
for  Lp1  via  successive  differences.  Therefore,  multiplication  by  Lm  =  (Lp  0  frny )  or  Lf/  can  be  done  in  a 
lineal-  number  of  scalar  operations.  Finally,  for  arbitrary  m  having  prime-power  factorization  m  =  ]~[,  mg, 
by  the  definitions  of  p,  d,  and  t  as  tensor  products  and  the  mixed-product  property,  we  also  have 


dT  =  t  lpT  ■  Lr 


where  Lm  = 


L 


me- 


(6.4) 


By  the  discussion  at  the  end  of  Section  2.1  we  can  therefore  multiply  by  Lm  or  Lm  in  0(nd )  scalar 


operations,  where  d  is  the  number  of  distinct  prime  divisors  of  m  and  n  =  ip(m). 


6.2  Decoding  RJ  and  Its  Powers 


Recall  from  Section  2.4. 1  the  “round-off”  decoding  procedure,  which  uses  short  linearly  independent  vectors 
in  a  dual  lattice  Av  to  recover  a  sufficiently  short  x,  given  x  mod  A.  To  decode  K /  ltJ ,  we  apply  the 
procedure  using  the  decoding  basis  d  of  Rv ,  whose  dual  basis  in  (R  /)v  =  R  is  the  conjugate  powerful 


basis  r(p).  By  Claim  2.10  the  distance  (or  subgaussian  parameter)  that  the  procedure  successfully  decodes 


from  depends  inversely  on  the  maximum  length  of  the  dual  elements,  and  by  Claim  4.2  every  pj  in  the 
powerful  basis  has  |r(pj)  ||2  =  \/n.  From  this  we  get  corresponding  bounds  on  the  decoding  operation,  as 


summarized  below  in  Lemmas  6.5  and  6.6  We  remark  that  the  decoding  basis  is  an  optimal  choice  here:  by 
Lemma  2.14  every  nonzero  element  of  R  has  length  at  least  ^Jn,  hence  no  shorter  set  of  dual  elements  exists. 


In  some  applications  (e.g.,  homomorphic  encryption),  we  need  to  solve  the  more  general  problem  of 
decoding  K/X ,  where  X  =  ( R:J ) =  (t:  k)  for  some  (usually  small)  k  >  1.  The  naive  way  to  do  this  would 
be  to  apply  the  round-off  procedure  with  the  Z-basis  t1~kd  of  X.  This,  however,  turns  out  to  be  highly 
suboptimal  for  many  values  of  m,  because  the  elements  of  the  dual  basis  tk~1r(p)  might  be  much  longer 
than  the  shortest  nonzero  elements  of  XJ  =  (tfc~1)|^| 

12This  can  be  seen  already  when  k  =  2  and  m  is  a  moderately  large  prime:  using  the  equality  t  =  m/g  and  noticing  that  some  of 
the  embeddings  of  g  =  1  —  are  very  close  to  zero,  we  see  that  the  length  of  t  is  a  rather  large  17  (m2). 
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Instead,  in  the  round-off  algorithm  we  use  the  scaled  decoding  basis  iv}~kd,  which  generates  the 
superideal  J  =  r'ri 1  k IVJ  =  t~kg1^k  D  Z,  and  whose  dual  elements  are  mk  1t(j))  C  Zv.  (Recall  from 
Definition  2.17  that  m  =  t  ■  g  for  some  g  E  R,  where  m  =  m/2  if  m  is  even  and  m  =  m  otherwise.) 
The  lengths  of  the  dual  elements  are  therefore  rhk  1  *Jn,  from  which  one  gets  the  bounds  summarized  in 
Lemmal6.6lbelow. 

We  point  out  that  the  use  of  iv}~kd  for  decoding  K/X  is  either  optimal  or  nearly  so.  Indeed,  by 
Lemma 
A 


2.14 


n 


{k—1  j/n 


and  Equation  (|2. 1 0[),  the  minimum  distance  of  Zv  =  ( 7?v ) 1  k  is  at  least  yfn  ■  N  ( /fv ) f  1  k^n  = 


K 


so  by  Equation 


the  dual  elements  m 


k- 1„ 


C  Zv  are  nearly  as  short  as  possible: 


™k  Mff)  ||2 

Ai(ZV) 


mk  1  \/n 
Ai(ZV)  - 


(  n 

odd  prime  p\m 


which  for  almost  all  choices  of  m  and  small  k  is  quite  small.  (For  example,  the  term  inside  the  parentheses  is 
only  ~  6.73  when  taking  all  odd  primes  up  to  17,  which  corresponds  to  m  >  255,255.)  Moreover,  the  above 
lower  bound  on  Ai(Zv)  may  not  be  tight;  we  suspect  that  in  most  cases  of  interest  the  minimum  distance 
of  Zv  is  exactly  rhfi~x^/n,  which  would  imply  that  the  scaled  decoding  basis  is  optimal. 

We  summarize  the  above  discussion  in  the  following  definition  and  lemmas.  As  it  will  be  more  convenient 
for  applications,  we  consider  a  “scaled  up  and  discretized”  version  of  the  decoding  procedure,  where  we 
decode  from  Xq  to  Z  for  some  q  >  1.  So  the  unknown  short  element  is  guaranteed  to  be  in  Z  but  is  given 
modulo  qX,  and  the  output  is  also  expected  to  be  in  Z.  The  only  difference  this  makes  (apart  from  the  scaling 
by  q)  is  that  for  k  >  2,  since  the  scaled  decoding  basis  m1  kd  may  generate  a  strict  superideal  J  D  Z,  the 
round-off  procedure  might  output  an  element  that  is  not  in  Z.  In  such  a  case  we  just  consider  the  output  to  be 
undefined.  Lemmas  |6 . 5 1  and  [676]  show  that  as  long  as  the  unknown  element  in  Z  is  short  enough  (or  has  a 
small  enough  subgaussian  parameter),  the  decoding  procedure  correctly  outputs  it. 

Definition  6.4  (Decoding  Xq  to  Z).  Let  X  =  (Ry)k  for  some  k  >  1,  and  define  the  decoding  function 
[•] :  Xq  — >  X  as  follows.  For  input  a  E  Xq,  write  a  =  a)  mod  qj  for  some  vector  a  over  7Lq,  where 

J  =  m1~kI$J  A  Z.  Define  [a]  :=  (rh1~kd,  [a])  if  this  value  is  in  X,  otherwise  [a]  is  undefined. 

( Recall  that  [a]  is  a  vector  over  Z,  as  defined  in  the  beginning  of  Section  [2] ) 

Lemma  6.5.  Let  X  =  ( Rw)k  for  some  k  >  1 ,  let  a  £  X  and  write  a  =  (r/i 1  kd.  a)  for  some  integral 
coefficient  vector  a,  and  let  q  >  1  be  an  integer.  If  every  coefficient  aj  E  [—q/2,  qf  2),  then  [a  mod  qX\  =  a. 
In  particular,  if  every  aj  is  5 -subgaussian  with  parameter  s,  then  [[a  mod  qX\  =  a  except  with  probability  at 
most  2?rexp(<5  —  irq2 /(2s)2). 


Proof.  The  first  part  is  by  Claim  2.10  The  second  part  is  by  the  tail  bound  on  subgaussian  random  variables 
(Equation  (|2.2[>),  and  the  union  bound.  □ 


Lemma  6.6.  LetX  =  (Rv)k  for  some  k  >  1,  and  let  a  E  Z. 

•  Writing  a  =  (m1~kd,  a)  for  some  integral  vector  a,  we  have  that  every  \aj\  <  mk~ly/n  ■  ||a||2. 

•  If  a  is  S-subgaussian  with  parameter  s,  and  b  E  (Ry  f  for  some  £>  0  is  arbitrary,  then  writing 
a-b  =  (rh1~k-£d,  c)  for  some  integral  vector  c,  we  have  that  every  Cj  is  5-subgaussian  with  parameter 


m 


k+I- 1 1 


s. 


We  remark  that  the  second  item  above  gives  a  bound  that  is  a  fiTi  factor  tighter  than  what  we  would 
obtain  by  treating  a-b  as  5-subgaussian  with  parameter  s||b||2.  The  tighter  bound  results  from  using  the 
particular  properties  of  the  powerful  basis,  namely,  that  all  its  elements  have  norm  1. 
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n.  The  first  item  then 


Proof.  The  dual  elements  of  rh1~kd  are  rhk  't(p),  which  all  have  i 2  norm  mk  l 
follows  by  the  Cauchy-Schwarz  inequality. 

For  the  second  item,  notice  that  the  coefficient  Cj  of  a  ■  b  in  the  scaled  decoding  basis  m1^k^kd  is 

Cj  =  Tr (mk+e~1T(pj)  ■  ab )  =  TV ((r(pj)  ■  b )  •  a), 

which  by  definition  and  by  Claim  |4~2|is  ci-subgaussian  with  parameter 

rhk+t-l\\T(Pj)  •  b\\2  ■  s  <  mfc+£-1||r(pj)||oo  '  Wbh  '  s  =  rhw"1||6||2  •  s.  □ 


6.2.1  Implementation  Notes 

We  conclude  this  subsection  by  outlining  an  efficient  implementation  of  the  decoding  operation  from 


Definition  6.4  As  usual,  we  wish  to  use  only  (nearly)  linear  time  operations,  and  avoid  high-precision 
quantities.  Recall  that  our  goal  is  to  recover  an  unknown  element  a  E  X  given  a  =  a  mod  qX,  where 
X  =  ( Rw)k  for  some  k  >  1.  We  assume  that  the  input  a  E  Xq  is  given  in  the  form  of  a  coefficient  vector  a 
over  TLq  satisfying  a  =  (ty  kb.  a)  mod  ql,  where  b  is  some  Zg-basis  of  Iky.  The  output  will  be  given  as  a 
coefficient  vector  a  over  Z  with  respect  to  the  decoding  basis  ty  kd  of  X. 

The  case  k  =  1  can  be  implemented  straightforwardly.  Suppose  the  basis  b  used  to  specify  the  input 
a  €  Rg  is  the  decoding  basis,  i.e.,  a  =  {d,  a)  mod  qRy .  We  then  simply  output  the  integer  coefficient 
vector  a  =  [a]  also  relative  to  the  decoding  basis,  i.e.,  a  =  (d,  a)  G  Ry .  The  number  of  operations  is  clearly 
linear.  If  the  input  is  represented  in  a  different  basis  b,  we  first  convert  to  the  decoding  basis,  which  is  very 
efficient  for  all  bases  we  consider. 

The  case  k  >  1  is  more  interesting,  and  consists  of  three  efficient  steps: 

1.  compute  the  representation  of  a!  =  a  mod  qj  in  the  Zr;-basis  m}~kb  of  Jq  (where  recall  that 
J  =  m1~kRy  D  T)\ 

2.  decode  it  as  in  the  case  k  =  1  to  an  element  a'  G  J  (which  will  equal  a  if  decoding  was  successful); 

3.  compute  the  representation  of  a'  in  the  Z-basis  t1~kd  of  I. 

We  next  explain  each  of  the  three  steps  in  detail. 

The  first  step,  it  turns  out,  is  equivalent  to  multiplication  by  gk  1  6  Ik  where  recall  from  Definition 
that  m  =  g  ■  t.  Indeed,  by  factoring  out  gk  l  from  the  modulus  and  both  sides  of  the  equality,  we  have 


2.17 


jfc-i 


9 


■  a  =  (t1  kb ,  a)  mod  qX 


a  =  (rh1  kb ,  a)  mod  qj, 


i.e.,  the  desired  coefficients  of  a  mod  qj  in  basis  ml~kb  are  exactly  those  of  <jk  1  a  in  basis  t1~kb.  Typically 
the  input  basis  b  at  this  stage  would  be  the  CRT  basis,  and  for  efficiency  one  could  precompute  the  CRT 
coefficients  of  gk  1 ,  making  this  step  linear  time.  In  addition,  multiplication  by  g  in  the  powerful  and 
decoding  bases  is  also  (nearly)  linear  time,  as  described  below. 

The  second  step  is  essentially  identical  to  the  case  k  =  1.  Take  the  output  a'  of  the  first  step,  convert  it  (if 
needed)  to  a  representation  in  the  scaled  decoding  basis  ml~kd,  so  that  a'  =  (ih1~kd,  a')  for  some  a'  over  Z9, 
and  then  output  the  coefficient  vector  [a'J  over  Z,  which  represents  the  element  a'  =  (m1  kd ,  [a']])  G  J. 


The  element  o'  is  exactly  the  output  of  the  decoding  procedure  as  in  Definition  6.4  except  that  it  might  not 
be  in  X  (in  which  case  decoding  failed). 
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Finally,  in  the  third  step,  we  convert  the  representation  of  a'  in  the  Z-basis  /it1  kd  of  J  to  a  representation 
in  a  Z-basis  of  Z,  namely  t1~kd.  This  conversion  might  be  impossible  if  a'  /  Z,  which  indicates  decoding 
failure.  Assuming  a'  £  X,  it  is  immediate  to  see  that  this  conversion  is  equivalent  to  division  by  gk  1 : 

glk  •  a’  =  a)  £  J  <*=>■  a  =  a)  £  X, 


i.e.,  the  desired  coefficients  of  a'  in  the  Z-basis  t l~kd  of  X  are  exactly  those  of  gx~k  ■  a'  in  basis  rh}~kd. 

Division  by  gk  1  can  be  performed  somewhat  efficiently  using  the  CRT  transform  over  C,  but  this 
requires  D(nlog?r)  time  and  high-precision  operations  (since  in  contrast  with  the  first  step,  here  we  are 
working  with  Z-bases,  and  not  modulo  q ).  A  better  way  follows  from  noticing  that  multiplication  and  division 
by  g  have  nice  forms  in  the  decoding  basis,  i.e.,  g  -dT  =  dT  ■  A  for  some  integral  matrix  A  that  is  efficient  to 
multiply  and  divide  by.  By  the  tensorial  decompositions  of  d  and  of  g,  it  suffices  to  consider  the  case  where  m 
is  a  power  of  a  prime  p.  Using  Equation  (|6.3[)  and  letting  m'  =  m/p,  one  can  verify  that  multiplication  and 
division  by  g  =  1  —  (p  in  the  decoding  basis  are  given,  respectively,  by  the  [n]-by-[n]  matrices 


A  = 

/  2  1  1  1  1\ 

-1  1 

-1  1 

0  I[m']  1  A  1  —  - 

/l  2  —  p  3  —  p  ■ 

1  2  3 -p  ■ 

12  3 

•  -1  \ 
•  -1 

•  -1 

V  -1  1/ 

l  J  p 

\1  2  3  • 

•  P-1  / 

It  is  easy  to  see  that  left-multiplication  by  A  can  be  performed  in  time  linear  in  n.  Moreover,  multiplication 
by  A~1  can  also  be  done  in  linear  time,  because  every  row  differs  from  each  of  its  adjacent  rows  in  just  one 
entry.  Note  that  to  avoid  rational  arithmetic,  one  would  actually  multiply  by  the  integer  matrix  pA~T  and 
then  evenly  divide  the  result  by  p.  If  the  latter  step  is  not  possible,  that  indicates  decoding  failure. 

Lastly,  we  also  note  that  multiplication  by  g  in  the  powerful  basis  is  given  by  JATJ,  where  J  =  Jrni  is 
the  [n]-by-[n]  reversal  matrix,  obtained  by  reversing  the  columns  of  the  identity  matrix  /-„  (so  J  =  J-1  and 
=  J[v(p)]  0  Therefore,  in  the  powerful  basis  we  can  also  multiply  and  divide  by  g  in  linear  time 

per  prime -power  divisor  of  rn. 


6.3  Sampling  Gaussians  in  the  Decoding  Basis 


We  now  describe  how  to  efficiently  sample  continuous  Gaussians  over  A®,  as  represented  in  the  decoding 
basis.  In  order  to  obtain  the  real  coefficient  vector  a  of  some  Gaussian-distributed  a  £  Ar,  by  Equation  (|6.1|) 
it  suffices  to  sample  a(a)  from  the  continuous  Gaussian  distribution  over  H  and  then  left-multiply  by  CRT*,. 
The  latter  step  is  best  done  using  the  sparse  decomposition  given  in  Section  [3  Recalling  the  definition  of  H 
and  its  unitary  basis  matrix  B  =  -A  ^  ^  C"”,  x  ^'Tr'T  fr0m  Section  2.2  we  see  that  sampling  a  (a) 

amounts  to  sampling  n  independent  real  Gaussians  used  as  coefficients  for  the  columns  of  B,  or  equivalently, 
sampling  the  first  n/2  complex  coordinates  as  independent  complex  Gaussians,  and  completing  the  remaining 
n/2  coordinates  using  the  conjugate  symmetry  of  //. 

While  the  above  is  already  quite  efficient,  here  we  show  that  a  significantly  faster  algorithm  exists  when 
rad(m)  <C  m.  The  basic  idea  is  to  notice  that  multiplication  by  the  matrix  CRT*,,  with  its  decomposition  as 
in  Equation  (|3.2[),  starts  with  multiplication  by  two  scaled  unitary  matrices:  a  (typically  high-dimensional) 
DFT  tensored  with  identity,  and  a  twiddle  matrix.  Since  spherical  Gaussians  are  invariant  under  unitary 
transformations,  we  can  effectively  skip  these  two  multiplications,  and  we  only  need  to  multiply  by  the  (often 
much  lower-dimensional)  CRT*  matrices  for  those  primes  p  dividing  rn.  Details  follow. 
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Using  Equation  (|3.2|>,  for  any  nif  that  is  a  power  of  a  prime  p#,  letting  rn'f  =  me/pe  we  have 

CRT^  =  (CRT*n  ®  JK])  •  ^  •  Qr 

for  some  unitary  because  the  twiddle  matrix  Tmt  and  scaled  Fourier  matrix  DFTm/  /  \/n7f  are  both 
unitary.  Therefore,  by  the  mixed-product  property  we  have 

CRT^  =  (g)f  CRT^  =  (g)£(CRT^  ®  JK])  •  y/m/iad(m)  (g)£  Qe. 

Since  Q  =  (x),  Qe  is  unitary,  it  sends  a  spherical  Gaussian  distribution  over  H  C  Cz™  to  a  spherical 
Gaussian  distribution  (of  the  same  parameter)  over  the  subspace  H'  =  QH  C  CZ*m  0  Therefore,  to 
sample  a  continuous  Gaussian  of  parameter  s  in  the  decoding  basis,  it  suffices  to  generate  a  Gaussian  of 
parameter  sy/m/  rad(m)  over  H'  and  then  left-multiply  the  result  by 

C*  :=  (g)/CRT^  ®  IK])  =  CRT*ad(m)  ®  J[ra/rad(m)]. 

The  latter  requires  n/ip(pe)  parallel  applications  of  CRT*  ,  in  sequence  for  each  £,  which  can  be  done  in  a 
total  of  0(n  log(rad(m)))  scalar  operations. 

It  remains  to  explain  how  to  sample  a  spherical  Gaussian  from  H' .  For  this  it  suffices  to  give  a  unitary 
basis  matrix  B'  of  H' ,  which  allows  us  to  generate  a  Gaussian  over  H'  as  B' c,  where  c  is  real  Gaussian. 
Now,  observe  that  the  subspace  H'  is 

F  =  {xe  CZ-  :  C*x  G  M[^(m)1}, 

because  H'  is  a  real  vector  space  of  dimension  n,  and  C*  H'  =  So  it  suffices  to  give  a  unitary 

matrix  B'  such  that  C*B'  is  real.  By  the  mixed-product  property,  such  a  matrix  is 

B'  =  ®  Ilm(\ )  ’ 

where  B'p(  =  j  f°r  Pi  >  2,  and  is  the  scalar  identity  for  pt  =  2.  Clearly,  multiplication  by  B' 

is  a  simple  linear-time  operation  in  the  dimension. 

Finally,  we  remark  that  because  the  final  vector  of  decoding  basis  coefficients  is  C*B'c  for  a  real 
Gaussian  c,  it  is  possible  to  generate  these  coefficients  using  just  real  arithmetic  as  D c,  where  D  = 
<S>e(DPe  ®  and  Dn  =  CRT^  •  B'pf  is  a  real  <p(pe)-by-<p(pe)  matrix. 


7  Regularity 


In  this  section  we  prove  a  certain  “regularity  lemma”  that  is  useful  in  cryptographic  applications,  such  as 
when  adapting  the  “primal”  |  Reg05 1  and  “dual”  IIGPV08I  LWE-based  cryptosystems,  and  the  identity -based 
versions  of  the  latter  scheme,  to  ring-LWE.  (See  Section [O] for  such  an  adaptation  of  the  dual  cryptosystem.) 
Independently,  a  closely  related  statement,  specialized  to  power-of-2  cyclotomics,  was  recently  shown 
in  lISS  1  ill  with  a  different  style  of  proof. 

The  theorem  says  the  following.  Assume  we  are  working  with  the  mth  cyclotomic  of  degree  n  =  p(rri), 
and  let  q  >  1  be  a  prime  integer.  Fet  aq, . . . , «/_  |  be  chosen  uniformly  and  independently  from  Rq.  Then, 


13Here  and  in  what  follows,  we  identify  the  index  set  1,*m  with  the  set  Ylt(Zpt  x  [m^])  as  in  the  decomposition  of  CRT,, 
similarly  identify  [tp(m)]  with  fj e[<p(m£)\. 


,  and 
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with  high  probability  over  the  choice  of  the  a*,  the  distribution  of  bo  +  JJi=i  *s  within  statistical  distance 
f)|-  uniforrri)  where  the  bi  are  chosen  from  a  discrete  Gaussian  distribution  on  R  of  width  essentially 
n  ■  q 1  ^  (in  the  canonical  embedding).  Equivalently,  the  lemma  says  that  if  ao  is  any  fixed  invertible  element 
of  Rq  and  a\, . . . ,  at- \  are  uniformly  and  independently  chosen  from  Rq,  then  o  ^l°l  is  within 
of  uniform,  where  the  6,  are  chosen  as  before.  The  equivalence  follows  by  simply  dividing  by  ao-  (The 
lemma  we  prove  is  actually  more  general,  and  applies  to  the  joint  distribution  of  k  >  1  sums  as  above;  see 
Theorem  7.4  and  Corollary |7.5|for  the  exact  statement.) 

This  regularity  statement  is  already  interesting  and  non-trivial  when  l  is  as  small  as  2,  and  is  close  to 
being  tight:  for  instance,  when  m  is  a  power  of  2,  a  width  of  at  least  y/nq1^  is  required  just  for  entropy 
reasons.  To  see  this,  recall  that  R  is  a  rotation  of  yfnLn ,  so  roughly  speaking,  a  discrete  Gaussian  of  width  t 
covers  ( t/y/n)n  points. 

One  might  wonder  about  the  significance  of  the  bo  term,  and  why  we  do  not  analyze  the  regularity  of 
zCi=o  ■>>a>  when  all  the  a*  are  chosen  uniformly  from  Rq.  In  fact,  a  regularity  lemma  for  exactly  such  sums 
was  shown  by  Micciancio  ltMic02il.  (His  work  is  specialized  to  the  ring  R  =  Z[x]/(xn  —  1),  but  can  be 
extended  to  other  rings,  as  observed  in  jSSTX09l.)  Unfortunately,  such  sums  have  a  much  worse  regularity 
property,  and  in  particular  require  super-constant  i  to  get  negligible  distance  to  uniformity.  To  see  why  this 
is  the  case,  assume  that  q  is  a  prime  satisfying  q  =  1  mod  m,  so  that  (q)  splits  completely  into  n  ideals  of 
norm  q  each.  Letting  q  denote  one  of  these  prime  factors,  notice  that  with  probability  q~  1 ,  all  the  a*  are  in  q. 
In  this  case,  ^!:=i  ■>>  ai  *s  *n  9  with  certainty,  and  its  distribution  is  therefore  very  far  from  uniform.  By  adding 
the  bo  term  we  avoid  this  “common  divisor”  problem  and  get  much  better  regularity,  providing  exponentially 
small  distance  to  uniformity  already  for  £  as  small  as  2.  It  is  also  worth  mentioning  that  including  the  bo  term 
(or  equivalently,  requiring  ao  to  be  uniform)  corresponds  to  the  “normal  form”  of  ring-LWE  and  ring-SIS. 

We  start  with  a  technical  claim  on  the  Gaussian  weight  on  a  lattice. 

Claim  7.1.  For  any  n-dimensional  lattice  A  and  e,r  >  0, 

Pi/r (A)  <  max  ^1,  ^  ^  (1  +  e). 


Proof.  For  r  >  %(AV).  the  claim  follows  from  Definition  2.5  For  r  <  //  =  r/e(Av),  it  follows  from  the 
Poisson  summation  formula  (see  jMR04l  Fenima  2.8])  that 


Pi/r(A)  =  (det  A)  ■  r  n  ■  pr (A  )  <  (det  A)  ■  r  n  ■  pv(A  )  =  (r//r)n  •  pi/l?(A), 

and  the  claim  follows  from  the  previous  case.  □ 

Using  Lemma  2.6  and  Lemma  2. 14|we  have 

%-2"(^V)  <  \/^/Ai(X)  <  (N(X))-1/", 


which  implies  the  following  corollary. 

Corollary  7.2.  For  any  ideal  X  and  r  >  0, 

pl/r(X)  <  max  (1,  N(X)”1  r~n )  (1  +  2~2n). 
We  will  also  need  the  following  algebraic  claim. 
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Claim  7.3.  In  the  mtli  cyclotomic  number  field  of  degree  n,  for  any  q,k  >  1, 

Y  N(J)fe  <  exp (c)qkn  <  qkn+2, 

J\(q) 

where  c  is  the  number  of  distinct  prime  integer  divisors  of  q. 


Proof  The  second  inequality  is  clear.  For  the  first  inequality,  it  suffices  to  consider  the  case  of  a  prime  power 
q  =  pe.  Indeed,  if  q\  and  g2  are  coprime  then 

E  N(^)fc  =  ( E  N(^)fc)(  E  N^)fc| 


Next,  recall  from  Section 


2.5.5 


that  for  any  integer  prime  p.  the  ideal  (p)  factors  as  p }[  ■  ■  ■  pk  where  h  =  p(pd), 
d  >  0  is  the  largest  integer  such  that  pd  divides  m,  each  p,  is  of  norm  pf  where  /  >  1  is  the  multiplicative 
order  of  p  modulo  m/pd,  and  g  =  n/(hf).  Therefore,  (q)  =  pf1  ■  ■  ■  p(d\  and 


E  N(^)fe  =  na + N(p*)fc  +  •  •  • + N(pi)ehk) 

J\(q) 


i= 1 


=  (i +p'k+---+fkikf 

<  pehfk9(l-p-fk)~9 

<  qnkexp(g-p~fk). 


Next,  observe  that  pf  is  greater  than  m/pd  (since  it  is  greater  than  1  and  equals  1  modulo  m/pd )  and  that 
g  <  n/ip(pd)  =  (p(m/pd ),  hence 

g-p~fk  <g-p~f<  l, 

which  completes  the  proof.  □ 


[/cl  X  [-£1 

The  following  is  the  regularity  theorem.  Here,  for  a  matrix  A  £  Rq  we  define 

A±(H)  ={l£  :  Az  =  0  mod  qR}, 

which  we  identify  with  a  lattice  in  H( .  Its  dual  lattice  (which  is  again  a  lattice  in  II1)  is  denoted  by  A  1  (A)J . 

Theorem  7.4.  Let  R  be  the  ring  of  integers  in  the  mth  cyclotomic  number  field  K  of  degree  n,  and  q  >  2  an 
integer.  For  positive  integers  k  <  I  <  poly(n),  let  A  =  [fik]  \  A]  £  (Rq)^x^\  where  I\p]  £  (Rq)^x^  is 
the  identity  matrix  and  A  £  (Rq)^x^~k^  is  uniformly  random.  Then  for  ail  r  >  2  n, 


Pi/r (A±(A)v)j  <  1  +  2 (r/n)~n£qkn+2  +  2~^n\ 


In  particular,  ifr  >  2 n  ■  qkR+2/(n(-)  then  Ej4[pi/r(A_L(A)v)]  <1  +  2  ^n\  and  so  by  Markov’s  inequality, 
ry2-n(n)  (AJ-(A))  <  r  except  with  probability  at  most  2-fAn)_ 


Using  Lemma  2.7  and  the  fact  that  A  contains  an  identity  submatrix  7[fci  and  so  the  columns  of  A  generate 


{kl 

all  of  Rq  ,  we  obtain  the  following  corollary,  which  is  often  more  useful  in  applications. 
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Corollary  7.5.  Let  R,  n,  q,  k,  and  £  be  as  in  Theorem 
chosen  as  in  Theorem 


7.4 


7.4. 


Assume  that  A  =  [1^]  \  A]  G  (Rq)^x^  is 

Then,  with  probability  1  —  2~^(n)  over  the  choice  of  A,  the  distribution  of 

_  rw  -  m 

Ax  G  Rq  where  each  coordinate  of  x  G  Rq  is  chosen  from  a  discrete  Gaussian  distribution  of  parameter 

r  >  2n  ■  qkR+'2/(n£)  over  /,>_  satisfies  that  the  probability  of  each  of  the  qnk  possible  outcomes  is  in  the 

interval  (1  ±  2 ~^(n))g~nfc  (anrf  [n  particular  is  within  statistical  distance  of  the  uniform  distribution 

over  Rq  ). 

Proof  of  Theorem  |7~4]  Observe  that  for  any  A  G  the  dual  lattice  of  AL(A)  is 

A±(A)V  =  (i?v)^  +  { \Ats  :  s  G  (Rq 

We  therefore  have 

E; 


i3l/r(^_L(^4)V)  —  E  Pl/r({RW)^  +  \ATS 


e-k 


=  E  Pi/r((RV)lk]  +  lqs)-^[p1/r(Rv+1-(a,s})\  ,  (7.1) 

•sC(/^)» 

where  a  is  chosen  uniformly  from  Ilq  .  For  any  s  =  (si, . . . ,  sf)  G  ( R^ ) ^ ,  define  the  ideal  T-  = 
s\R  +  •  •  •  +  SkR  +  qRv  C  Rf;  this  is  the  “greatest  common  divisor”  ideal  of  all  the  st  and  qRf .  Note  that 
(a,  s)  is  uniformly  random  over  T^/qRf ,  and  so  the  expectation  above  is 

{1,/qRr1  ■  pyr(^). 

Therefore,  if  we  let  T  denote  the  set  of  all  ideals  J  satisfying  qRv  C  J  C  7?v,  we  can  write  as 

E  I  J/qRT(t~k)  •  Pl/r{\jf-k  E  Pl/r{^)[k]  +  \S) 

J&T  ss.t.Xj =J 

<  Pl/r(Rwy  +  E  \J/qRT(e~k)  •  Pi/r{\JY~k  •  (pi/r{\J)k  - 1) 

J&T\{qRP} 

<  Pl/r(RV)1  +  E  \J/qRW\-{t~k)  •  (pi/rCqjY  -  l) 

JeT\{qRS} 

=  1  +  E  I  J/qRV\-{e~k)  •  (Pi/r^jy  -  l),  (V.2) 

J&T 

where  in  the  first  inequality  we  used  the  fact  that  for  every  J  G  T  \  {qRy  },  the  sets  ( IfJ ) ^  for  all  s 

satisfying  Zj  =  J  are  disjoint,  and  their  union  is  contained  in  i^J) ^  \  {0}.  Next,  using  Corollary |7.2| 
see  that 

Px/A\JY  <  max  (l,  {\J / qRv\  •  Atfr"")*)  (1  +  2"2^ 

<  1  +  f'21_2n  +  2(|l7/gi?v|  •  AKr~ny. 

This,  together  with  Claim [73] and  (|2.8|).  allows  us  to  bound  (|7.2[)  by 

1  +  2"°(ri)  +  2A eKr~ni  E  I  J/qR"' 


we 


->V|fc 


J£T 

<  1  +  2“n(n>  +  2  {r/n)~neqkn+2, 


and  the  theorem  follows. 


□ 
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8  Cryptosystems 

Here  we  give  three  example  applications  of  our  toolkit,  which  all  work  in  arbitrary  cyclotomic  rings: 


•  In  Section  |8.1[  we  give  a  simple  adaptation  of  the  “dual”  LWE-based  public -key  cryptosystem 
of  HGPV08H.  which  uses  our  regularity  lemma  of  Section  [7]  and  which  can  serve  as  a  foundation 
for  (hierarchical)  identity-based  encryption; 

•  In  Section  [872]  we  give  a  public-key  cryptosystem  with  more  compact  public  keys  and  ciphertexts  (of 
only  two  ring  elements  each),  analogous  to  the  ones  of  IILPS10I  LP11II: 

•  In  Section [873],  we  describe  a  symmetric-key  “somewhat  homomorphic”  cryptosystem  and  associated 
“modulus  reduction”  and  “key  switching”  algorithms. 


We  emphasize  that  throughout  this  section,  the  cryptosystems  and  associated  operations  are  defined 
almost  entirely  in  an  implementation-  and  basis-independent  manner,  using  just  abstract  mathematical  objects 
and  operations  (e.g.,  ring  addition  and  multiplication,  cosets  of  ideals  and  probability  distributions  over  them, 
etc.).  All  of  the  operations  can  be  performed  very  efficiently  using  the  algorithms  described  earlier  in  the 
paper. 

In  particular,  our  cryptosystems  need  to  sample  from  subgaussian  distributions  over  cosets  of  IT/  (or  a 
scaling  of  it).  For  this  purpose  we  can  use  any  valid  discretization 
to  any  continuous  error  distribution  pj  over  K\ 


as  described  in  Section  2.4.2  applied 


The  choice  of  discretization  affects  only  the  resulting 
subgaussian  parameter  of  the  sample.  For  example,  we  can  use  the  “coordinate-wise  randomized  rounding” 
method  with  the  decoding  basis  d  of  i?v,  which  gives  good  subgaussian  bounds  (see  Lemma 


6.2 1. 


8.1  Dual-Style  Cryptosystem 


In  this  section  we  present  the  ring-based  variant  of  what  is  commonly  called  “dual”  LWE  encryption,  first 
introduced  in  1GPV081  for  the  purposes  of  constructing  identity-based  encryption  schemes.  (The  name 
“dual”  refers  to  the  fact  that  the  system  has  dual  properties  to  Regev’s  first  LWE-based  cryptosystem  [Reg05), 
namely,  the  public  key  is  statistically  close  to  uniform,  whereas  ciphertexts  are  only  pseudorandom  and  have 
unique  encryption  randomness.) 

Let  R  denote  the  mth  cyclotomic  ring  (of  degree  n  =  ip{m))  and  let  p  and  q  be  coprime  integers,  where  p 
defines  the  message  space  Rp  and  q  is  the  ring-LWE  modulus.  Let  L  be  a  continuous  LWE  error  distribution 
over  JT]r,  and  let  ■]  denote  a  valid  discretization  to  (cosets  of)  Rw  or  pRJ .  In  the  key-generation  algorithm 
we  need  to  sample  from  the  discrete  Gaussian  distribution  I)j;  -r  for  some  r  >  \Jn  ■  u  ( \/log  n) ;  we  can  do  so 


using  the  algorithm  from  Lemma  2.9  with  the  powerful  basis  p  of  R,  since  by  Claim  4.2  its  (Gram-Schmidt 
orthogonalized)  elements  have  maximum  length  \JTi.  We  also  let  £  >  2  be  a  parameter. 

The  cryptosystem  is  defined  as  follows. 


•  Gen:  choose  ao  =  —  1  E  Rq  and  uniformly  random  and  independent  a\. ... .  ag- 1  E  Rq,  and 

independent  xo,  ■  ■  ■ ,  xg-i  -t—  Dnr.  Output  a  =  (ai, . . . ,  ag-i,  ag  =  —  aixi)  F  Rq1'""^  as  the 

public  key,  and  x  =  (x\, . . . ,  xe_i,xe  =  1)  E  R^1’-^  as  the  secret  key.  Note  that  (a,  x)  =  xq  €  Rq, 
by  construction. 

•  Encj(q  E  Rp):  choose  independent  eo,  ei, . . . ,  &i-\  <—  [p  ■  ip ]pRv,  and  e£  E-  [p  ■  ip]t-ip+PRv-  Let 

e  =  (ei, . . . , ee)  E  Output  ciphertext  c  =  eo  •  a  +  e  E  (Rq)^-1,--,e\ 
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•  Decj(c):  compute  d  =  |(c,  x)]]  G  Rv  (see  Definition  6.4),  and  output  p  =  t  ■  d  mod  pR. 

Lemma  8.1.  If  r  >  2 n  •  g1/^+2/(n^))  then  the  above  cryptosystem  is  IND-CPA  secure  assuming  the  hardness 
of  /i'-DLWE,,  ,/,  given  l  +  l  samples. 


Proof  By  Corollary 


distribution  over  R, 


7.5 


(with  k  =  1),  t 
By  Lemma 


ic  public  key  a  is  within  statistical  distance  2  of  the  uniform 


2.23 


and  Lemma 


2.24 


it  follows  that  for  any  message  p  (chosen 


adversarially  given  a),  the  ciphertext  c  =  eo  •  a  +  e  is  computationally  indistinguishable  from  uniform  and 
independent  of  the  public  key,  under  the  hardness  assumption.  □ 

Lemma  8.2.  Suppose  that  for  any  c  G  Rp,  [p  ■  if]  c+pRV  /lV  8-subgaussian  with  parameter  s  for  some 
6  =  0(l/£),  and  q  >  s^J ( r2£  +  1  )n  ■  w(\/log  n).  Then  decryption  is  correct  with  probability  1  —  negl(n) 
over  all  the  randomness  of  key  generation  and  encryption. 

In  particular,  if  if  is  a  continuous  Gaussian  with  parameter  s'  >  1,  and  we  use  coordinate-wise  randomized 


rounding  in  the  decoding  basis  for  discretization,  then  by  the  discussion  in  Section  2.4.2  and  the  equality 


s\(d)  =  yri'ad(m) jrn  from  Lemma  6.2  we  have  that  [p  ■  if]c+pRy  is  O-subgaussian  with  parameter 
s  =  p\J  s'2  +  27rrad(m)/m  =  0{ps'). 

Proof.  By  construction,  (c,  x)  =  eoXo  +  (e,  x)  =  (e1,  x')  mod  qRf ,  where  e'  =  (eo,  ei, . . . ,  ef)  G  (7?v)^+1l 
and  x'  =  (x'q.  x  \ .....  X{-  =  1 )  G  R^+l\  Furthermore,  (e' ,  x')  =  t  1  //  mod  pRv ,  so  decryption  is  correct  as 
long  as  \{e' ,  x')  mod  qRf  =  (e',  x')  G  Rf .  We  next  show  that  this  holds  with  probability  1  —  negl(n)  over 
the  choice  of  e'  x' . 


By  Lemma  2.8  for  each  i  G  [(]  we  have  | xt  \ \ 2  <  r^/n  except  with  probability  at  most  2  "  =  negl(n), 

6.6  (with  k  =  1,  l  =  0),  for  every  i  G  [£]  each 


and  \\xe\\2  =  ||1||2  =  \/n.  Then  by  Item  6.6 


of  Lemma 


coefficient  of  when  represented  in  the  decoding  basis  is  5-subgaussian  with  parameter  srffi.,  and  each 
one  of  etxt  is  5-subgaussian  with  parameter  S\Jn.  Since  the  e,  are  mutually  independent,  each  decoding-basis 
coefficient  of  (e',x')  is  5{£  +  l)-subgaussian  with  parameter  s-\/ {r2£  +  l)n.  Since  5{£  +  1)  =  0(1),  the 
claim  follows  by  Lemma  [63]  □ 


8.2  Compact  Public-Key  Cryptosystem 

As  in  the  previous  subsection,  let  R  denote  the  mth  cyclotomic  ring  and  let  p.  q  be  coprime  integers,  where 
the  message  space  is  Rp.  We  also  require  q  to  be  coprime  with  every  odd  prime  dividing  m.  Also  let  if  be  a 
continuous  LWE  error  distribution  over  76®,  and  let  •]  denote  a  valid  discretization  to  (cosets  of)  IfJ  or  pR  J . 
The  cryptosystem  is  defined  as  follows. 

•  Gen:  choose  a  uniformly  random  a  <—  Rq.  Choose  x  +-  [if]Rv  and  e  +-  [p  ■  if\pRy- 

Output  (a,  b  =  m(a  ■  x  +  e)  mod  qR)  G  Rq  x  Rq  as  the  public  key,  and  x  as  the  secret  key. 

(Note  that  because  rh  =  t  ■  g,  7?v  =  (t_1),  and  a-x  +  e  G  Rf  /qRf ,  we  have  rh(a  ■  x  +  e)  G  gR/gqR, 
which  is  then  reduced  mod  qR  to  obtain  b  G  Rq.) 

•  Enc(ajfe)(^  G  Rp):  choose  z  <-  [if}Rv,  e'  +-  [p  ■  if]pRv,  and  e"  +-  [ p  ■  if\t-ip+pRv. 

Let  u  =  m(z  ■  a  +  e!)  mod  qR  and  v  =  2  •  b  +  e"  G  Rq.  Output  (u,  v)  G  Rq  X  Rq. 

•  D ecx(u,  v):  compute  v-u-x  =  rh(e-z  —  e'  ■  x)  +  e"  mod  qRf ,  and  decode  it  to  d  =  \v  —  u -xj  G  Rv 
(see  Definition|6.4|).  Output  p  =  t  ■  d  mod  pR. 
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Lemma  8.3.  The  above  cryptosystem  is  IND-CPA  secure  assuming  the  hardness  of  /i,-DLWEf/  ,/,. 


Proof.  The  security  proof  follows  from  two  applications  of  the  ring-LWE  assumption  in  its  normal  form  (see 
Lemma  2.24),  with  secret  drawn  from  [if]Rv-  First,  we  claim  that  the  public  key  is  indistinguishable  from 
uniform.  Using  the  transformation  from  Lemma  2.23  with  w  =  0,  we  see  that  the  pair  (a,  a-x+e)  £  Rq  x  Ry, 
where  a,  x,  e  are  sampled  as  in  the  Gen  procedure,  is  indistinguishable  from  uniform.  Now  consider 
the  transformation  that  multiplies  the  second  component  by  m  and  reduces  the  result  modulo  qR.  This 
transformation  maps  pairs  (a,  a  ■  x  +  e)  distributed  as  before,  to  pairs  in  Rq  x  Rq  distributed  as  the  output  of 
the  Gen  procedure.  Moreover,  since  (g)  and  (q)  are  coprime  by  Corollary |2T8j  and  recalling  that  rhRv  =  gR, 
we  see  that  this  transformation  maps  the  uniform  distribution  over  Rq  x  Rq  to  the  uniform  distribution  over 
Rq  x  Rq.  This  completes  the  proof  of  the  first  claim. 

It  remains  to  show  that  if  the  public  key  (a,  b )  is  uniformly  random  in  Rq  x  Rq,  then  for  any  message 
f  i  £  Rp,  the  joint  distribution  of  the  public  key  together  with  Enc(a  (//)  is  computationally  indistinguishable 
from  uniform.  To  see  this,  consider  a  reduction  that  is  given  access  to  a  distribution  over  Rq  x  K^/qRy 
which  is  either  Az^  (for  z  -t—  \_ip~\  RJ)  or  uniform.  It  obtains  two  samples  (a' ,  n")  and  (if ,  v')  from  the 
distribution,  and  applies  the  transformation  from  Lemma  2.23  with  w  =  0  to  ( a' ,  u")  to  obtain  (a,  if  ),  and 
with  w  =  tE 1  ft  £  Rp  to  (6',  v')  to  obtain  (6,  v).  The  reduction  then  outputs  (a,  b)  as  the  public  key,  and 
( u  =  mu'  mod  qR,  v)  £  Rq  x  Rf  as  the  encryption  of  g. 


If  the  unknown  distribution  was  uniform,  then  it  follows  that  (o,  b,  u,  v )  is  uniform  in  R^  x  Rq .  (Showing 
that  u  is  uniform  in  Rq  is  done  as  above,  in  the  proof  of  the  first  claim.)  On  the  other  hand,  if  the  unknown 
distribution  is  Az^,  then  (a,  b )  has  uniform  distribution,  and  it  can  be  verified  that  (u,  v )  has  the  same 
distribution  as  generated  by  EnC(a  b)(/r).  This  completes  the  proof.  □ 


We  finally  show  that  under  suitable  parameters,  decryption  is  correct  with  overwhelming  probability. 

Lemma  8.4.  Suppose  that  [if]  Ry  outputs  elements  having  I2  norm  bounded  by  i  with  1  —  negl(n)  probability, 
that  [p  ■  ff  c+pfC'  (for  any  coset  c  +  PRV )  is  S-subgaussian  with  parameter  sfor  some  5  =  0(1),  and  that 
q  >  Sy/2(fh£)2  +  n  •  ce(\/logn).  Then  decryption  is  correct  with  probability  1  —  negl(n)  over  all  the 
randomness  of  key  generation  and  encryption. 


In  particular,  and  just  as  in  the  previous  subsection,  if  if  is  a  continuous  Gaussian  with  parameter  s1  >  1, 
and  we  use  coordinate-wise  randomized  rounding  in  the  decoding  basis  for  discretization,  then  [p  ■  ff\c+pRy 
is  0-subgaussian  with  parameter  s  =  p\J s'2  +  27rrad(m)/m  =  0(ps').  Moreover,  by  the  fact  that  if  has 
1  —  2~n(n)  0f  ps  mass  on  vectors  of  length  at  most  s'y/n,  and  because  discretization  increases  lengths  by  at 
most  .s']  (d) \/n  (by  the  triangle  inequality),  we  have  that  \ff  RV  outputs  elements  having  norm  bounded  by 
l  :=  (s'  +  ^/rad (m)/m)y/n  =  0(s'y/n),  except  with  negl(n)  probability. 


Proof.  By  construction,  e,e'  £  pRv  and  x,z  £  Rv ,  so  fh(e  ■  z  —  e'  ■  x)  £  pRv .  Therefore,  E  := 
fh(e  ■  z  —  (f  ■  x)  +  e"  £  R(J  satisfies  E  =  //  mod  pRy  when  e"  is  chosen  as  when  encrypting  //,  so  decryption 
is  correct  as  long  as  \E  mod  qRy}  =  E.  We  next  show  that  this  holds  with  probability  1  —  negl(n). 


By  assumption,  ||x| 


2> 


<  l:  with  probability  1  —  negl(n),  and  e,  e! ,  and  e"  are  (5-subgaussian  with 


parameter  s.  Then  by  Item  2  of  Lemma |(T6|(  with  k  =  1,  t  =  0),  each  coefficient  of  fh  ■  ez,  m  ■  Ex  £  Ry  when 
represented  in  the  decoding  basis  is  J-subgaussian  with  parameter  s  fnt,  and  those  of  e"  are  5-subgaussian 
with  parameter  Sy/n.  Since  e,  tf ,  e"  are  mutually  independent,  each  decoding-basis  coefficient  of  E  is 
35-subgaussian  with  parameter  sy/2(m 


+  n.  The  claim  follows  by  Lemma 


6.5 


□ 
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8.3  Symmetric-Key  Homomorphic  Cryptosystem 

Here  we  define  a  symmetric-key  cryptosystem  that  is  “somewhat  homomorphic,”  i.e.,  it  supports  limited 
additive  and  multiplicative  homomoiphic  operations.  It  is  essentially  the  Brakerski-Vaikuntanathan  sys¬ 
tem  HB V 1  lbll  based  on  ring-LWE,  but  with  improved  parameters  and  generalized  to  arbitrary  cyclotomics, 
which  introduces  several  technical  challenges.  We  also  describe  generalized  “key  switching”  (also  known 
as  degree  reduction)  and  “modulus  reduction”  procedures  akin  to  those  first  described  for  standard  LWE 
in  [IB  V  Hal.  and  for  ring-LWE  in  power-of-2  cyclotomics  in  [BGV121.  (The  techniques  developed  here  can 
also  be  adapted  to  work  with  the  “scale  free”  perspective  adopted  in  HBral2H.)  The  scheme  can  also  be  made 
to  support  unbounded  homomorphic  operations  using  Gentry’s  “bootstrapping”  technique  llGen09b  1  iGen09al , 
and  also  can  be  efficiently  adapted  to  a  public-key  system  using  the  regularity  lemma  from  Section[7] 


Description  of  the  scheme.  Let  R  denote  the  mth  cyclotomic  ring  (of  degree  n  =  tp(m ))  and  let  p  and  q 
be  coprime  integers,  where  p  defines  the  message  space  Rp  and  q  is  the  ring-LWE  modulus.  To  support 
“degree  reduction”  (see  Section  8.3.2  below),  we  also  require  (j>),  ( g )  C  R  to  be  coprime  ideals,  which  is  the 
case  if  and  only  if  p  is  coprime  with  all  odd  primes  dividing  m  (see  Corollary|2.18[). 

The  secret  key  is  a  ring  element  s  £  R  chosen  from  a  certain  distribution  (specifically,  t  times  the  LWE 
error  distribution  over  ltd  see  below).  We  say  that  a  ciphertext  of  degree  k  >  1  is  a  polynomial  c  =  c(S )  of 
degree  at  most  (and  usually  equal  to)  k  in  an  indeterminate  S,  having  coefficients  in  Zq  where  X  =  (Ry)k. 
(Fresh  ciphertexts  produced  by  the  encryption  algorithm  will  have  degree  k  =  1,  whereas  those  produced  by 
the  homomorphic  operations  may  have  larger  degree.)  A  ciphertext  c(S)  encrypting  a  message  //  £  Rp  under 
secret  key  s  £  R  satisfies  the  relation 


c(s)  =  e  mod  qT 

for  some  sufficiently  “short”  e£l  such  that  e  =  t~k  ■  p  mod  pi  (where  “short”  can  refer  to  the  (2  norm, 
i0 o  norm,  or  subgaussian  parameter  as  needed).  Therefore,  given  the  secret  key  s  £  R  one  can  compute 
e  =  [c(s)J  £  X  and  recover  the  message  as  tk  ■  e  mod  pR.  We  refer  to  e  as  the  “noise”  in  the  ciphertext,  and 
its  subgaussian  parameter  or  i 2  norm  determines  the  size  of  q  needed  to  ensure  correct  decryption  with  high 
probability,  and  the  underlying  hardness  assumption.  For  each  operation  supported  by  the  system,  we  give 
(nearly)  tight  bounds  on  the  growth  or  shrinkage  of  the  noise’s  subgaussian  parameter  and  £2  norm;  these 
bounds  can  be  combined  in  a  modular  way  to  calculate  appropriate  parameters  for  a  particular  application. 

Throughout  this  subsection,  let  ^  be  a  continuous  LWE  error  distribution  over  W®,  and  let  •]  denote 
any  valid  discretization  to  cosets  of  some  scaling  of  Rd  (e.g.,  using  the  decoding  basis  d  of  R  J).  The 
cryptosystem  is  defined  formally  as  follows. 


•  Gen:  choose  s'  <—  [ip ]RV,  and  output  s  =  t  ■  s'  £  R  as  the  secret  key. 

•  Encs(p  £  Rp):  choose  e  «—  [p  ■  Let  co  =  —  c\  ■  s  +  e  £  Ry  for  uniformly  random 

ci  ■£-  Rq ,  and  output  the  ciphertext  c(S )  =  Co  +  ciS.  The  “noise”  in  c(S)  is  defined  to  be  e. 

•  Decs(c(5))  for  c  of  degree  k:  compute  c(s )  £  ( Rv)k ,  and  decode  it  to  e  =  |c(s)]  £  ( Rv)k .  Output 
H  =  tk  ■  e  mod  pR. 


The  homomorphic  operations  are  defined  as  follows.  For  ciphertexts  c,  d  of  arbitrary  degrees  k,  k' 
(respectively),  their  homomorphic  product  is  the  degree- (/,:  +  k')  ciphertext  c(S)  □  c'(S )  =  c(S)  ■  d (S)  (i.e., 
standard  polynomial  multiplication).  The  noise  in  the  result  is  defined  to  be  the  product  of  the  noise  terms 
of  c,  d .  Similarly,  for  ciphertexts  c,  d  of  equal  degree  k,  their  homomorphic  sum  is  defined  as  the  degree-^ 
ciphertext  c(S)  E0  d (S)  =  c(S)  +  d (S),  and  the  noise  in  the  resulting  ciphertext  is  the  sum  of  those  of  c,  d . 
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(Observe  that  any  degree-A;  ciphertext  resulting  from  these  operations  has  coefficients  in  (Rd)k,  as  required.) 
To  homomorphic  ally  add  two  ciphertexts  of  different  degrees,  we  must  first  homomorphic  ally  multiply  the 
one  having  smaller  degree  by  a  fixed  public  encryption  of  1  E  Rp  enough  times  to  match  the  larger  degree p*] 
It  is  easy  to  verify  that  if  the  noise  terms  in  all  the  ciphertexts  are  correctly  decoded  by  the  decryption 
algorithm,  then  its  output  is  correct: 

Dec<j(Encs(/r))  =  //, 

Decs(cffl  d)  =  Decs(c)  +  Decs(c/)  mod  pi?, 

Decs(c  □  d)  =  Decs(c)  •  Decs(c/)  mod  pR. 

The  following  lemma  gives  a  sufficient  condition  for  correct  decoding  to  occur,  and  follows  directly  from 
Lemmas  16.51  and  16.61 

Lemma  8.5.  Suppose  the  noise  e  in  a  degree-k  ciphertext  c  is  5-subgaussian  with  parameter  r  for  some 
5  =  0(1),  and  q  >  r  ■  mk~l ^/n  ■  uj(\/\ogn).  Then  Dec5(c)  correctly  recovers  e  with  probability  1  —  negl(n). 
Alternatively,  if  q  >  2||e|j.,mfc_1y/n,  then  Decs(c)  recovers  e  with  certainty. 

The  next  two  lemmas  give  (nearly)  tight  bounds  on  the  subgaussian  parameter  of  the  noise  under  the 
homomorphic  operations.  They  follow  directly  from  the  definition  of  the  noise  term,  the  properties  of 
subgaussian  random  variables  (described  in  Section [23]),  and  the  triangle  inequality. 

Lemma  8.6.  If  the  noise  terms  in  ciphertexts  Ci  are  independent  and  5i-subgaussian  with  parameters  ri 
(respectively),  then  the  noise  in  the  ciphertext  EEL,;  c,  is  (XO?  0 ) -subgaussian  with  parameter  (fd,  rf  'f //2. 
Moreover,  it  is  always  the  case  that  the  1 2  and  norms  of  the  noise  terms  in  EBj  c,  are  at  most  the  sums  of 
those  in  the  ct. 


Lemma  8.7.  Let  e,  e!  be  the  noise  terms  in  ciphertexts  c,  d,  respectively.  Then  the  noise  e  ■  e!  in  the  ciphertext 
c  □  d  satisfies  \\e  ■  d\\  <  ||e||  •  ||e/||00,  where  ||-||  denotes  either  the  £2  or  norm.  Moreover,  if  e  is 
5-subgaussian  with  parameter  r,  then  the  noise  e  ■  d  is  5-subgaussian  with  parameter  r  •  lle'H^.  In  particular, 
if  d  is  5-subgaussian  with  parameter  d  and  is  independent  of  e,  then  e  ■  d  is  within  negl(n)  statistical 
distance  of  a  5-subgaussian  with  parameter  r  ■  d  ■  uj(\/logn). 

Proof.  The  first  claim  follows  directly  from  Equation  (|2.5[),  and  the  second  one  by  the  first  part  of  Claim [2~4| 
For  the  last  claim,  by  subgaussianity  we  have  ||e/||00  <  d  •  cj(\/logn),  except  with  negl(n)  probability.  □ 


Lemma  8.8.  The  above  cryptosystem  is  IND-CPA  secure  assuming  the  hardness  of  /?-DLWEf/  ,/,. 


Proof.  We  describe  a  reduction  that  is  given  access  to  either  an  LWE  distribution  Asr  ^  or  the  uniform 
distribution  over  Rq  x  K^/qRd .  In  the  former  case  we  can  assume  that  the  distribution  is  in  normal  form,  i.e., 
the  secret  s'  E  Rd  is  distributed  according  to  [V’l  rv  (see  Lcmma[2.24[).  The  reduction  simulates  an  encryption 
oracle  that  in  the  former  case  implements  the  encryption  algorithm  Enc,  for  secret  key  s  =  t-  s'  G  R  (which  is 
distributed  according  to  the  output  of  Gen),  and  in  the  latter  case  simply  returns  ciphertexts  that  are  uniformly 
random  and  independent  of  the  queried  messages.  This  suffices  to  prove  IND-CPA  security. 

To  respond  to  an  encryption  query  on  message  p  E  Rp,  the  reduction  draws  a  sample  ( a' ,  b ')  E 
v  from  the  unknown  distribution.  It  then  applies  the  transformation  from  Lemma  2.23  with 


R, 


■q  X  K^/qR 


l4In  particular,  we  can  just  multiply  c(S)  by  (an  appropriate  power  of)  f-1  =  g/rh  £  Rw .  By  definition  of  g,  this  element  has 
norm  ||t_1||00  <  2 e /m  <  1,  where  I  is  the  number  of  odd  primes  dividing  m,  so  multiplication  by  t_1  does  not  increase  the  £ 2 
norm  or  subgaussian  parameter  of  the  noise. 
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w  =  t  1p  E  Rp  to  obtain  (a,  6)  <E  Rq  x  It  lets  ci  =  —  t  'a  E  and  co  =  b,  and  outputs  the  ciphertext 
c(S)  =  c0  +  ci  S'. 

Suppose  that  the  unknown  distribution  is  the  ring-LWE  distribution  Asi  ^  for  s'  E  Rv,  and  let  s  =  t-  s'  E 
R.  Then  by  Lemma  2.23  the  pair  (a,  b )  is  such  that  a  is  uniformly  random  in  Rq,  and  b  =  a  ■  s'  +  e  = 
(f_1o)s  +  e  mod  qRv,  where  e  <-  [p  ■  Therefore,  ci  =  — f_1a  is  uniformly  random  in  7?.^, 

and  co  =  b  =  —  ci  •  s  +  e,  so  c(S)  is  distributed  exactly  according  to  Encs(/r). 

On  the  other  hand,  if  the  unknown  distribution  is  the  uniform  distribution,  then  by  Lemma  2.23  the  pair 
(a,  b)  is  uniformly  random  and  independent  of  p,  and  therefore  so  are  the  coefficients  of  the  ciphertext  c(S),iU 


8.3.1  Modulus  Reduction 

The  modulus  reduction  procedure  changes  the  ciphertext  modulus  from  q  to  some  q'  <  q  (where  q'  is 
coprime  with  p),  and  outputs  a  ciphertext  that  encrypts  (essentially)  the  same  message,  and  whose  noise  term 
shrinks  nearly  proportionately.  The  procedure  works  best  and  is  simplest  to  describe  in  the  case  of  degree-1 
ciphertexts,  which  can  always  be  obtained  via  the  key  switching  procedure  described  below  in  Section  8.3.2| 

The  following  operation  is  central  to  the  modulus  reduction  procedure.  Let  J  be  an  ideal  and  let  q,  q' .  p 
be  integers  with  both  q  and  q'  coprime  to  p.  Let  v  E  Zp  be  v  =  q'  ■  mod  p.  Define  a  randomized 
function  Fj  :  Jq  — »  K  in  the  following  way:  given  x  E  Jq  and  some  good  basis  of  J ,  sample  a  short 
(subgaussian)  element  from  the  coset  (v  —  q'/q)  ■  x  +  pj  using  one  of  the  valid  methods  described  in 
Section  2.4.2|  and  let  Fj(x)  be  the  result.  Note  that  the  coset  ( v  —  q' /q)  ■  x  +  pj  is  well  defined  because 
(v  —  q' /q)(qJ)  =  (vq  —  q')J  C  pj.  Also  observe  that  for  all  x  E  Jq,  we  have  (q'  / q)x  +  Fj(x)  E  Jq<  and 
qFj(x )  E  pj  with  certainty. 

We  now  describe  the  modulus  reduction  procedure.  Let  c(S)  =  ('o  +  c i  S  be  an  input  ciphertext,  with 
co,  ci  E  Rq.  Let  /o  <—  Frv(co)  and  fi  f_1  •  Fq(t  ■  ci),  where  we  use  coordinate- wise  randomized 
rounding  with  the  decoding  basis  d  of  RJ  for  the  former,  and  with  the  powerful  basis  p  of  R  for  the  latter. 
The  output  is  the  ciphertext  c' (S)  =  c'{)  +  c\  S,  where 


Cq  =  —  c0  +  /o  mod  q'Rv , 


c\  =  —  ci  +  f  \  mod  q'Rv . 

q 


Notice  that  by  the  first  of  the  above  properties,  we  have  Cq ,  c\  E  Rq,  as  required.  Notice  also  that  if 
s  =  t  ■  s'  E  R  is  the  secret  key  and  e  is  the  noise  in  c(S),  so  that  cq  +  c\s  =  e  mod  qRv ,  then 


c'o  +  C^s  =  —  (co  +  cis)  +  (fo  +  fis)  =  —  e  +  (/0  +  (f/i)  •  s')  mod  q'Rv.  (8.1) 

q  q 

Accordingly,  we  define  the  noise  in  the  ciphertext  c'(S )  to  be  e'  =  (q' /q)e  +  (fo  +  fis),  which  is  in  Rv 
because  c'() ,  c)  E  Rr), . 

The  following  lemma  describes  the  procedure’s  effect  on  the  noise  and  plaintext.  It  says  that  the  error 
is  scaled  by  a  factor  of  q'/q,  plus  a  modulus -independent  amount  that  depends  only  on  the  l^c  norm  of 
s'  =  t~1s  E  (which  was  chosen  from  [V'l  _rv  and  hence  is  short).  It  also  shows  that  the  procedure 
implicitly  introduces  a  factor  of  v  =  q'  ■  q~]  E  Rp  into  the  message,  which  can  be  kept  track  of  and  removed 
upon  decryption,  because  q'  is  coprime  with  p  by  assumption.  In  general,  this  extra  factor  seems  inherent 
to  modulus  reduction,  though  it  can  be  avoided  by  always  using  q'  =  q  mod  p,  which  always  holds  in  the 
common  case  p  =  2. 
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Lemma  8.9.  If  the  noise  in  the  input  ciphertext  is  e  E  KJ ,  then  the  noise  e!  E  Rv  in  the  output  ciphertext 
satisfies  e'  =  q'  ■  q^1  ■  e  mod  pRv .  Moreover,  e'  equals  (q' /q)e  plus  a  random  variable  f  that,  for  any  value 
ofe,  is  O-subgaussian  with  parameter 


pV2Tt(^ad(m)/m  +  rh\\t  1s||^0^ 


1/2 


andforwhich  ||/||2  <  py/n(^^/Tad(m) /m  +  s/di\\t  1s||oc^  always. 


In  particular,  if  e  is  5-subgaussian  then  by  Claim  2. 1  so  is  e' ,  although  it  may  not  be  independent  of  e. 


Proof  Since  both  e,  e'  E  R'J  and  q  is  coprime  with  p,  showing  that  e'  =  v  ■  e  mod  pRf  is  equivalent  to 
showing  that  qe'  —  q'e  E  pRv.  The  latter  follows  immediately  from  the  definition  of  e!  and  the  fact  that 
qFj(x )  E  pj  always. 

The  subgaussianity  claim  on  e'  =  (q' /q)e  +  (/o  +  fis )  follows  by  the  fact  that  for  any  values  of  co,  c\, 
the  terms  /o  and  tf\  are  O-subgaussian  with  respective  parameters  p\/‘2tts]  (d)  and  p\j2iis\  (p)\  the  bounds 


on  s±(d)  and  s±(p)  given  in  Lemmas 


6.2 


and 


4.3 


respectively;  and  Claim 


2.1 


Similarly,  the  claim  on  ^  |2 

follows  from  the  fact  that  coordinate- wise  randomized  rounding  to  a  coset  of  pRf  (respectively,  pR)  using 
basis  p  ■  d  (resp.,  p  ■  p)  always  yields  an  element  having  1 2  norm  bounded  by  p^/ns\(d)  (resp.,  py/ns\  {])))', 
by  Equation  (|2.5[);  and  by  the  triangle  inequality.  □ 


8.3.2  Key  Switching/Degree  Reduction 

The  key-switching  procedure  (also  known  as  “degree  reduction”)  converts  any  degree-/,;  ciphertext  c(5) 
encrypted  under  a  secret  key  s  E  R,  to  a  degree- 1  ciphertext  c' (S')  encrypted  under  a  key  s'  E  R  (which  may 
or  may  not  be  the  same  as  s ).  Notice  that  when  decrypting  c(S),  the  evaluation  c(s)  is  simply  a  linear  function 
in  the  powers  s°,  s1, . . . ,  sk  E  R.  The  main  idea  behind  the  key-switching  method  introduced  in  MB V Hal  is 
to  homomorphic  ally  apply  this  linear  function  to  suitable  encryptions  (under  s')  of  these  powers;  we  refer 
to  these  ciphertexts  as  the  key-switching  “hint.”  Implementing  this  idea  requires  some  care  in  our  setting, 
however,  due  to  the  different  powers  of  Rv  involved  in  the  operations  and  their  homomorphic  counterparts. 


Rewriting  the  decryption  relation.  Let  1  =  ( Rw)k  and  d  =  k  +  1,  let  .5  =  (s°, . . . ,  sk)  E  R,j\  and  let 
c  E  ijf  be  the  coefficient  vector  of  a  valid  degree-/,:  ciphertext  c(S).  Then  for  a  degree-/,’  ciphertext  c,  we 
have  the  decryption  relation 

(c,  s)  =  e  mod  ql 

for  some  short  (subgaussian)  e  E  t~k p  +  pZ.  We  first  put  this  relation  in  a  more  convenient  form,  viewing 
the  ciphertext  in  the  slightly  “denser”  quotient  m1~kRf1  (because  m 1  ~  k  /? v  D  Z),  and  then  scaling  it  up  by  a 
mk  1  factorial  We  also  multiply  and  divide  c  and  s  (respectively)  by  t,  yielding 

(t  ■  mk-1c ,  rH)  =  mfc~1e  mod  qRv . 
y£Rlv] 


We  write  the  relation  in  this  way  so  that  f_1s  is  over  Rv,  which  is  the  appropriate  domain  for  encrypting  it  in 
the  key-switching  hint,  and  so  that  y  is  over  Rq,  which  will  be  needed  for  decomposing  it  into  short  elements 
of  R  as  part  of  the  key-switching  operation. 


'This  is  essentially  the  same  idea  used  in  decoding  Zq  to  Z,  as  described  in  Section 


6.2 
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We  finally  make  one  more  important  change  to  the  decryption  relation.  Let  i  =  [log2  q]  and  define 

g=(l,2,4,...,2<-1)GZf]  and  G  =  I[d]®  gT  (8.2) 

Then  for  any  x  G  such  that  Gx  =  y  G  R,f ,  we  have 

(x  ,  t~1GTs)  =  (Gx  ,  t_1s)  =  mk~le  mod  qRv .  (8.3) 

The  hint  will  consist  essentially  of  an  encryption  of  t~1GTs,  and  the  key-switching  operation  will  homo- 
morphically  compute  its  inner  product  with  a  short  (subgaussian)  x  so  as  to  keep  the  error  in  the  resulting 
ciphertext  small.  The  need  for  a  short  x  is  why  we  arranged  for  y  to  be  over  Rq,  because  we  always  have  a 
good  basis  for  R  (namely,  the  powerful  basis)  that  has  nearly  optimal  spectral  norm  s  \{p)  =  \/rb,  whereas 
we  do  not  always  have  such  a  good  basis  of  X  =  (Rw)k  for  k  >  1. 

Alternative  relations.  As  an  optimization,  we  can  actually  omit  the  constant  term  1  from  s.  This  decreases 
the  dimension  d  by  one,  thereby  reducing  the  size  of  the  hint  and  the  amount  of  extra  noise  introduced  by 
the  key-switching  procedure.  For  ciphertext  c(S)  =  0  c,>S"  we  then  define  c  =  (c\, . . . ,  Ck),  so  that 

the  main  decryption  relation  becomes  co  +  (c,  s)  =  e  mod  qX.  The  hint-generation  and  key-switching 
procedures  then  work  exactly  as  described  below,  with  the  additional  step  that  we  add  the  constant  term 
rhk  1  co  mod  qRw  to  the  output  ciphertext  c'(S').  This  works  because  the  key-switching  procedure  ensures 
that  o' (s')  ss  mk~l(c,  s)  =  mk~1(e  —  co)  mod  qRv . 

Similarly,  when  the  original  and  target  secret  keys  are  equal,  i.e.,  s'  =  s,  we  can  omit  both  1  and  s  from  s, 
define  c  =  (c2, . . . ,  c/-),  and  write  the  decryption  relation  as  (co  +  cis)  +  (c,  s)  =  e  mod  qX.  We  can  then 
apply  the  procedures  below,  adding  the  linear  polynomial  rnk  1  (cq  +  c.\  S)  mod  ql!y  to  the  output  ciphertext 
c' (S)  of  the  key-switching  procedure. 

Finally,  the  vector  g  need  not  contain  only  powers  of  2,  but  may  be  defined  with  respect  to  a  larger  integer 
base  (thereby  decreasing  the  dimension  /:),  or  may  even  consist  of  other  exponentially  increasing  sequences. 
The  particular  choice  of  g  mainly  affects  the  length  (or  subgaussian  parameter)  of  the  decomposition  x  G  I:^'ir . 
See  HMP12I  for  further  discussion. 


Constructing  the  hint.  The  hint  is  a  collection  of  independent  degree-1  ciphertexts  hi  (S')  for  each  i  G  \d£\, 
prepared  as 

hi(S')  •(—  Encs/(0)  +  t~1(GTs)i  mod  qRv , 

i.e.,  we  generate  degree-1  encryptions  of  0  and  simply  add  entries  of  t~1GTs  to  their  constant  terms.  Notice 
that  by  construction, 

hi(s')  =  fi  +  t~1(GTs)i  mod  qRv 

for  some  short  (subgaussian)  /,;  G  [>R  J  having  distribution  [p  ■  -if) ]p^v-  Note  also  that  hi(S')  may  not  actually 
be  a  well-formed  encryption  of  any  particular  message,  because  hi(s')  may  not  be  congruent  modulo  q RJ  to 
any  short  enough  element  of  RJ ;  however,  this  does  not  matter  for  the  key-switching  application. 

To  the  vector  /  =  (/j)ie[d€l  °f  noise  terms  in  the  hint  we  associate  a  measure  of  quality  F,  defined  as 


F 


max 


di 

(Ei^(/i)i2) 

n=i 


1/2 


(8.4) 


and  bound  it  as  follows. 
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Claim  8.10.  If  the  entries  fj  E  Rv  of  f  are  all  5-subgaussian  with  parameter  s  for  some  5  =  0(1),  then 

F  <  Cs  ■  max ( Vdl,  u  ( y/log  n) ) 

except  with  negl(n)  probability,  for  some  universal  constant  C  >  0. 

Proof  Write 


3= i 


=  ZiSP&Waitfj))2  +  ^G(ai(/j))5 

V  .  /  V  .  .. 

3=1  3=1 

dl  d£ 

!  max  max  |  ^  5R(cri  (fj))2,  ^  3(oy  (fj )  )2  ) . 


<2: 


1=1 


1=1 


Each  of  the  2n  sums  is  a  sum  of  squares  of  dt  independent  (5-subgaussian  variables  with  parameter  s/V 2. 
The  claim  now  follows  by  applying  Lemma |T2| to  each  of  the  sums  and  applying  the  union  bound.  □ 


The  key-switching  procedure.  The  procedure  takes  as  input  5  G  \  computes  y  =  t  ■  rhk  1c  E  Rql\ 
and  generates,  as  described  below,  a  short  (subgaussian)  x  E  R'lf^  such  that  Gx  =  y.  It  then  outputs  the 
degree- 1  ciphertext 

c'(s')  =J2x>- h^- 

i£[d£] 

evaluating  c' (S')  at  S'  =  s'  gives 


Notice  that  by  i 

c'(s ')  =  ^2  xi(fi  +  G1(GTs)i )  =  (x,f)  +  (x,t~1GTs)  =  (x,  f)  +  rhk~1e  mod  qRv . 
ie[d£] 

Accordingly,  we  define  the  noise  term  in  d  to  be  e!  =  (x,  f)  +  rhk~1e  E  Rv.  Notice  that  the  noise  is 
congruent  to  mk~1e  modulo  pR  f  because  each  f  ,  E  pRv  by  construction  of  the  hint.  The  noise  is  also 
relatively  short:  the  mk~ 1  factor  of  e  is  exactly  offset  by  switching  from  modulus  ql  =  q(Rf)k  to  qRv ,  and 


(x,  f)  is  short  because  both  x  and  /  are.  (See  Lemma  8.11  for  a  precise  analysis.) 


Also  note  that  while  decrypting  the  original  ciphertext  c(S)  would  yield  the  message  tke  =  p  mod  pR, 
the  resulting  degree- 1  ciphertext  d  (S')  decrypts  to  the  message  t  ■  mk~1e  =  gk  ~  1  //  mod  pR.  This  means  that 
an  implementation  must  keep  track  of  the  “true”  underlying  degree  of  each  ciphertext  (and  limit  homomorphic 
additions  to  ciphertexts  of  equal  “true”  degree),  even  if  its  degree  as  a  polynomial  has  been  reduced  via 
key  switching.  Upon  final  decryption,  the  extra  gk  1  factor  in  the  message  can  be  removed  as  long  as  g  is 
invertible  modulo  p,  which  by  Corollary  |2.18|is  the  case  because  we  have  assumed  that  p  is  coprime  with 
every  odd  prime  dividing  m. 

The  next  lemma  says  that  the  key-switching  procedure  introduces  into  the  ciphertext  some  subgaussian 
error,  proportional  to  the  quality  F  of  the  noise  vector  /  in  the  hint. 

Lemma  8.11.  Fix  an  arbitrary  vector  f  and  let  F  be  as  defined  in  Equation  (|8.4[).  Assume  that  for  some 
5  =  0(1),  every  entry  Xj  E  R  ofx  is  5-subgaussian  with  parameter  s',  conditioned  on  any  values  of  the 
ciphertext  c  and  x\, ,  Xj- Then  for  any  value  of  the  original  noise  term  e,  the  additional  noise  term 
(x,  f)  is  (dl)  5 -subgaussian  with  parameter  Fs'.  In  particular,  if  e  is  5-subgaussian  with  parameter  s"  then 
the  new  noise  term  e!  =  (x,  f)  +  mk~1e  is  (d£  +  l)5-subgaussian  with  parameter  \J (Fs')2  +  (mk~1s")2. 


Proof  The  subgaussianity  claim  on  (x,  f)  follows  directly  from  Claim 
Claim  IQ 


2.4 


The  claim  on  d  is  immediate  by 

□ 
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Choosing  x.  In  the  key-switching  procedure  we  need  to  sample  a  subgaussian  x  G  such  that  Gx  =  y 
for  a  given  y  G  R^,  where  G  G  Z<f  x  is  as  defined  in  Equation  (|8.2[).  To  start,  define  the  /i- module 

A±(G)  =  {z  G  :  Gz  =  0  G 


which  may  also  be  seen  as  a  lattice  in  H  ^ .  The  set  of  all  solutions  x  to  Gx  =  y  is  then  a  coset  of  this 
module.  (A  solution  always  exists,  because  G  contains  the  identity  Iqn  as  a  submatrix.)  Given  a  high-quality 
Z-basis  of  AJ-(G),  we  can  use  any  of  the  methods  described  in  Section|2.4.2|for  subgaussian  sampling  over 


the  desired  lattice  coset,  e.g.,  coordinate-wise  randomized  rounding.  As  usual,  the  relevant  measures  of  basis 
quality  are  the  largest  singular  value  and  the  maximum  length  of  the  Gram-Schmidt  orthogonalized  vectors. 


Lemma  8.12.  There  is  an  efficiently  computable  Z-basis  Z  G  rW^W^  of  Af~(G)  satisfying  the  following 
bounds,  where  \\Z\\.2  denotes  the  largest  £2  norm  of  the  Gram-Schmidt  orthogonalized  vectors  Z.  If  q  is  a 
power  of  2,  then  si(Z)  <  2>s/diand  \\Z\\2  =  2  y/n;  otherwise,  s\(Z)  <  ^/(9  +  wt2  (q))m  and  \\Z  ||2  =  v/5 n, 
where  wtffiq)  denotes  the  number  of  Is  in  the  binary  expansion  of  q. 


The  remainder  of  this  section  is  dedicated  to  a  proof  of  the  lemma.  We  first  recall  that  Micciancio  and 
Peikert  [MP121  Section  4]  constructed  good  bases  for  the  integer  lattices 

Cl(G)  =  {zG  :  Gz  =  0  G  Zjtf}, 

which  we  now  briefly  summarize  (see  that  work  for  further  details  and  full  proofs).  Recalling  that  G  = 

I[d]  <8>  gT  G  zj^x[d£]  where  g  =  (1,  2, 4, . . . ,  2£_1)  G  zj^  and  £  =  [log2  q] ,  define  Sg  G  Z^XM  as 


S9  = 

(  2 

-1  2 

-1 

\ 

if  q  =  2^,  otherwise  Sg  = 

(  2 

-1  2 

-1 

70  \ 

71 

72 

V 

2 

-1  2) 

2  qi-2 
-1  qt-\) 

where  q  =  (qg-i  ■  ■  ■  (pqfi  =  (h‘^'  f°r  <U  £  {0, 1}  is  the  binary  representation  of  q.  It  is  clear  by 

inspection  that  the  columns  of  Sg  are  all  in  the  lattice  C  (g7  );  moreover,  as  shown  in  HMP121.  they  are 
indeed  a  basis  of  the  lattice.  (This  can  be  seen  by  verifying  that  the  determinants  of  Sg  and  C  (g  r)  are 
equal.)  It  immediately  follows  that  S  =  1^  0  Srj  G  is  a  basis  for  the  lattice  £±(G). 

In  1MP121  it  is  shown  that  ||5||2  =  | Sg \ \ 2  =  2  if  q  =  2f  (where  we  orthogonalize  from  right  to  left), 
and  is  sfb  otherwise  (where  we  orthogonalize  from  left  to  right).  It  also  directly  follows  from  the  triangle 
inequality  and  Pythagorean  theorem  that  si(5)  =  s\(Sg)  <  3  if  q  =  2^,  and  is  at  most  y/9  +  wt 2(q) 
otherwise. 

We  now  claim  that 

Z  =  S®PT  =  I[d]®Sg®pT  £  RWMMn] 


is  a  Z-basis  of  AJ-(G)  satisfying  the  bounds  in  Lemma  8.12  where  p  is  the  powerful  basis  of  R.  For 


the  bounds,  observe  that  by  Lemma |43j  the  fact  that  the  longest  Gram-Schmidt  orthogonalized  vector  of 
c j(pT )  =  CRTm  has  length  v/n,  and  the  properties  of  singular  values  and  orthogonalization  under  the  tensor 
product,  we  have 


si{Z)  =  si  (S')  ■  si(p)  =  si(S)  ■  Vm  and  \\Z\\2  =  ||<S'||2  •  ||CRTm||2  =  ||S'||2  •  y/n, 
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which  when  combined  with  the  above  bounds  from  IMP  121  yields  the  claim.  It  only  remains  to  show  that  Z 
is  a  Z-basis  of  AJ-(G),  which  is  a  consequence  of  the  following  simple  lemma. 

Lemma  8.13.  Let  A  G  zj for  some  h,k  >  1  be  arbitrary.  If  B  is  any  Z-basis  of  C^{A)  C  and  b 
is  any  Z-basis  of  R,  then  B  ®bT  is  a  Z-basis  of  (A)  C  B\k\ 

Proof  Clearly,  every  element  of  B  0  b1  is  in  A  f  A).  To  show  that  it  is  a  basis,  let  z  G  A  (A)  be  arbitrary, 
so  Az  =  0  G  2?j^.  Then  we  can  uniquely  write  z  =  bj  ■  z j  for  some  vectors  z j  G  iM.  By  linearity 

and  uniqueness  with  respect  to  b,  this  implies  that  Azj  =  0  G  z{^  for  every  j,  so  each  z j  G  C '  (  A)  can  be 
written  uniquely  as  a  Z-linear  combination  of  elements  in  B.  It  follows  that  z  can  be  expressed  uniquely  as  a 
Z-linear  combination  of  elements  in  B  0  bT .  □ 
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Abstract 

We  show  that  the  Learning  with  Errors  (LWE)  problem  is  classically  at  least  as  hard  as  standard 
worst-case  lattice  problems,  even  with  polynomial  modulus.  Previously  this  was  only  known  under 
quantum  reductions. 

Our  techniques  capture  the  tradeoff  between  the  dimension  and  the  modulus  of  LWE  instances,  lead¬ 
ing  to  a  much  better  understanding  of  the  landscape  of  the  problem.  The  proof  is  inspired  by  techniques 
from  several  recent  cryptographic  constructions,  most  notably  fully  homomorphic  encryption  schemes. 


1  Introduction 


Over  the  last  decade,  lattices  have  emerged  as  a  very  attractive  foundation  for  cryptography.  The  appeal  of 
lattice-based  primitives  stems  from  the  fact  that  their  security  can  be  based  on  worst-case  hardness  assump¬ 
tions,  that  they  appear  to  remain  secure  even  against  quantum  computers,  that  they  can  be  quite  efficient, 
and  that,  somewhat  surprisingly,  for  certain  advanced  tasks  such  as  fully  homomorphic  encryption  no  other 
cryptographic  assumption  is  known  to  suffice. 

Virtually  all  recent  lattice-based  cryptographic  schemes  are  based  directly  upon  one  of  two  natural 
average-case  problems  that  have  been  shown  to  enjoy  worst-case  hardness  guarantees:  the  short  integer  so¬ 
lution  (SIS)  problem  and  the  learning  with  errors  (LWE)  problem.  The  former  dates  back  to  Ajtai’s  ground¬ 
breaking  work  |Ajt96[,  who  showed  that  it  is  at  least  as  hard  as  approximating  several  worst-case  lattice 
problems,  such  as  the  (decision  version  of  the)  shortest  vector  problem,  known  as  GapSVP,  to  within  a  poly¬ 
nomial  factor  in  the  lattice  dimension.  This  hardness  result  was  tightened  in  followup  work  (e.g.,  IIMR04II ). 
leading  to  a  somewhat  satisfactory  understanding  of  the  hardness  of  the  SIS  problem.  The  SIS  problem 
has  been  the  foundation  for  one-way  |Ajt96|  and  collision-resistant  hash  functions  HGGH961.  identification 
schemes  HMV031  Lyu08|lKTX08L  and  digital  signatures  lIGPVOSIICHKPlOl  BoylO|lMP121  Lyul2|]. 
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Our  focus  in  this  paper  is  on  the  latter  problem,  learning  with  errors.  In  this  problem  our  goal  is  to 
distinguish  with  some  non-negligible  advantage  between  the  following  two  distributions: 

((a*,  (a*,  s)  +  e;  mod  q))i  and  ((a rti))i  , 


and 


where  s  is  chosen  uniformly  from  Z"  and  so  are  the  a*  €  Z”,  m  are  chosen  uniformly  from  ^q, 
the  “noise”  e*  €  Z  is  sampled  from  some  distribution  supported  on  small  numbers,  typically  a  (discrete) 
Gaussian  distribution  with  standard  deviation  aq  for  a  =  o(l). 

The  LWE  problem  has  proved  to  be  amazingly  versatile,  serving  as  the  basis  for  a  multitude  of  crypto¬ 
graphic  constructions:  secure  public-key  encryption  under  both  chosen-plaintext  [Reg05|  IPVW081 ILPITI 
and  chosen-ciphertext  FPW081  [Pei091 1MP121  attacks,  oblivious  transfer  1PVW081.  identity -based  encryp¬ 
tion  HGP  V  08 1 ICHKP 1  Ol  I  ABB  1  Oal  I  ABB  1  ObL  various  forms  of  leakage-resilient  cryptography  (e.g.,  IIAGV091 
IACPS091  IGKPVldl).  fully  homomorphic  encryption  DBVllllBGV121lBral2ll  (following  the  seminal  work 
of  Gentry  liGen09:l  ).  and  much  more.  It  was  also  used  to  show  hardness  of  learning  problems  1KS06II. 

Contrary  to  the  SIS  problem,  however,  the  hardness  of  LWE  is  not  sufficiently  well  understood.  The  main 
hardness  reduction  for  LWE  [|Reg05 1  is  similar  to  the  one  for  SIS  mentioned  above,  except  that  it  is  quantum. 
This  means  that  the  existence  of  an  efficient  algorithm  for  LWE,  even  a  classical  (i.e.,  non-quantum)  one, 
only  implies  the  existence  of  an  efficient  quantum  algorithm  for  lattice  problems.  This  state  of  affairs  is  quite 
unsatisfactory:  even  though  one  might  conjecture  that  efficient  quantum  algorithms  for  lattice  problems  do 
not  exist,  our  understanding  of  quantum  algorithms  is  still  at  its  infancy.  It  is  therefore  highly  desirable  to 
come  up  with  a  classical  hardness  reduction  for  LWE. 

Progress  in  this  direction  was  made  by  I  Pei  091  (with  some  simplifications  in  the  followup  by  Lyuba- 
shevsky  and  Micciancio  ltLM09l).  The  main  result  there  is  that  LWE  with  exponential  modulus  is  as  hard 
as  some  standard  lattice  problems  using  a  classical  reduction.  As  that  hardness  result  crucially  relies  on 
the  exponential  modulus,  the  open  question  remained  as  to  whether  LWE  is  hard  for  smaller  moduli,  in  par¬ 
ticular  polynomial  moduli.  In  addition  to  being  an  interesting  question  in  its  own  right,  this  question  is  of 
special  importance  since  many  cryptographic  applications,  as  well  as  the  learning  theory  result  of  Klivans 
and  Sherstov  1KS061.  are  instantiated  in  this  setting.  Some  additional  evidence  that  reducing  the  modulus  is 
a  fundamental  question  comes  from  the  Learning  Parity  with  Noise  (LPN)  problem,  which  can  be  seen  as 
LWE  with  modulus  2  (albeit  with  a  different  error  distribution),  and  whose  hardness  is  a  long-standing  open 
question.  We  remark  that  |Pei09l  does  include  a  classical  hardness  of  LWE  with  polynomial  modulus,  albeit 
one  based  on  a  non-standard  lattice  problem,  whose  hardness  is  arguably  as  debatable  as  that  of  the  LWE 
problem  itself. 

To  summarize,  prior  to  our  work,  the  existence  of  an  efficient  algorithm  for  LWE  with  polynomial  mod¬ 
ulus  was  only  known  to  imply  an  efficient  quantum  algorithm  for  lattice  problems,  or  an  efficient  classical 
algorithm  for  a  non-standard  lattice  problem.  While  both  consequences  are  unlikely,  they  are  arguably  not 
as  earth-shattering  as  an  efficient  classical  algorithm  for  lattice  problems.  Hence,  some  concern  about  the 
hardness  of  LWE  persisted,  tainting  the  plethora  of  cryptographic  applications  based  on  it. 


Main  result.  We  provide  the  first  classical  hardness  reduction  of  LWE  with  polynomial  modulus.  Our 
reduction  is  the  first  to  show  that  the  existence  of  an  efficient  classical  algorithm  for  LWE  with  any  subex¬ 
ponential  modulus  would  indeed  have  earth-shattering  consequences:  it  would  imply  an  efficient  algorithm 
for  worst-case  instances  of  standard  lattice  problems. 

Theorem  1.1  (Informal).  Solving  n-dimensional  LWE  with  poly(n)  modulus  implies  an  equally  efficient 
solution  to  a  worst-case  lattice  problem  in  dimension  ffin. 
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As  a  result,  we  establish  the  hardness  of  all  known  applications  of  polynomial-modulus  LWE  based  on 
classical  worst-case  lattice  problems,  previously  only  known  under  a  quantum  assumption. 

Techniques.  Even  though  our  main  theorem  has  the  flavor  of  a  statement  in  computational  complexity,  its 
proof  crucially  relies  on  a  host  of  ideas  coming  from  recent  progress  in  cryptography,  most  notably  recent 
breakthroughs  in  the  construction  of  fully  homomorphic  encryption  schemes. 

At  a  high  level,  our  main  theorem  is  a  “modulus  reduction”  result:  we  show  a  reduction  from  LWE 
with  large  modulus  q  and  dimension  n  to  LWE  with  (small)  modulus  p  =  poly(n)  and  dimension  nlog2  q. 
Theorem  1 1 . 1 1  now  follows  from  the  main  result  in  ||Pei091,  which  shows  that  the  former  problem  with  q  =  2n 
is  as  hard  as  n-dimensional  GapSVP.  We  note  that  the  increase  in  dimension  from  n  to  nlog2  q  is  to  be 
expected,  as  it  essentially  preserves  the  number  of  possible  secrets  (and  hence  the  running  time  of  the  naive 
brute-force  algorithm). 

Very  roughly  speaking,  the  main  idea  in  modulus  reduction  is  to  map  7Lq  into  Zp  through  the  naive 
mapping  that  sends  any  a  €  {0, . . . ,  q  —  1}  to  [pa/q\  €  {0, ...  ,p  —  1}.  This  basic  idea  is  confounded  by 
two  issues.  The  first  is  that  if  carried  out  naively,  this  transformation  introduces  rounding  artifacts  into  LWE, 
ruining  the  distribution  of  the  output.  We  resolve  this  issue  by  using  a  more  careful  Gaussian  randomized 
rounding  procedure  (Section  |3]).  A  second  serious  issue  is  that  in  order  for  the  rounding  errors  not  to  be 
amplified  when  multiplied  by  the  LWE  secret  s,  it  is  essential  to  assume  that  s  has  small  coordinates.  A  major 
part  of  our  reduction  (Section  |4|)  is  therefore  dedicated  to  showing  a  reduction  from  LWE  (in  dimension  n) 
with  arbitrary  secret  in  Z”  to  LWE  (in  dimension  n  log2  q)  with  a  secret  chosen  uniformly  over  {0, 1}.  This 
follows  from  a  careful  hybrid  argument  (Section  14.31)  combined  with  a  hardness  reduction  to  the  so-called 
“extended- LWE”  problem,  which  is  a  variant  of  LWE  in  which  we  have  some  control  over  the  error  vector 
(Sectionl4~2l). 

We  stress  that  even  though  our  proof  is  inspired  by  and  has  analogues  in  the  cryptographic  literature, 
the  details  of  the  reductions  are  very  different.  In  particular,  the  idea  of  modulus  reduction  plays  a  key  role 
in  recent  work  on  fully  homomorphic  encryption  schemes,  giving  a  way  to  control  the  noise  growth  during 
homomorphic  operations  I BV 1 1 ,  BGV12,  Bra  121.  However,  since  the  goal  there  is  merely  to  preserve  the 
functionality  of  the  scheme,  their  modulus  reduction  can  be  performed  in  a  rather  naive  way  similar  to 
the  one  outlined  above,  and  so  the  output  of  their  procedure  does  not  constitute  a  valid  LWE  instance.  In 
our  reduction  we  need  to  perform  a  much  more  delicate  modulus  reduction,  which  we  do  using  Gaussian 
randomized  rounding,  as  mentioned  above. 

The  idea  of  reducing  LWE  to  have  a  {0, 1}  secret  also  exists  already  in  the  cryptographic  literature: 
precisely  such  a  reduction  was  shown  by  Goldwasser  et  al.  I1GKPV 1011  who  were  motivated  by  questions 
in  leakage-resilient  cryptography.  Their  reduction,  however,  incurred  a  severe  blow-up  in  the  noise  rate, 
making  it  useless  for  our  purposes.  In  more  detail,  not  being  able  to  faithfully  reproduce  the  LWE  distribution 
in  the  output,  they  resort  to  hiding  the  faults  in  the  output  distribution  under  a  huge  independent  fresh  noise, 
in  order  to  make  it  close  to  the  correct  one.  The  trouble  with  this  “noise  flooding”  approach  is  that  the 
amount  of  noise  one  has  to  add  depends  on  the  running  time  of  the  algorithm  solving  the  target  {0, 1}- 
LWE  problem,  which  in  turn  forces  the  modulus  to  be  equally  big.  So  while  in  principle  we  could  use 
the  reduction  from  IGKPV101  (and  shorten  our  proof  by  about  a  half),  this  would  lead  to  a  qualitatively 
much  weaker  result:  the  modulus  and  the  approximation  ratio  for  the  worst-case  lattice  problem  would  both 
grow  with  the  running  time  of  the  {0, 1}-LWE  algorithm.  In  particular,  we  would  not  be  able  to  show  that 
for  some  fixed  polynomial  modulus,  LWE  is  a  hard  problem;  instead,  in  order  to  capture  all  polynomial 
time  algorithms,  we  would  have  to  take  a  super-polynomial  modulus,  and  rely  on  the  hardness  of  worst- 
case  lattice  problem  to  within  super-polynomial  approximation  factors.  In  contrast,  with  our  reduction,  the 
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modulus  and  the  approximation  ratio  both  remain  fixed  independently  of  the  target  {0. 1}-LWE  algorithm. 

As  mentioned  above,  our  alternative  to  the  reduction  in  BGKPV101  is  based  on  a  hybrid  argument  com¬ 
bined  with  a  new  hardness  reduction  for  the  “extended  LWE”  problem,  which  is  a  variant  of  LWE  in  which 
in  addition  to  the  LWE  samples,  we  also  get  to  see  the  inner  product  of  the  vector  of  error  terms  with  a 
vector  z  of  our  choosing.  This  problem  has  its  origins  in  the  cryptographic  literature,  namely  in  the  work 
of  O’Neill,  Peikert,  and  Waters  IIOPWlll  on  (bi)deniable  encryption  and  the  later  work  of  Alperin-Sheriff 
and  Peikert  HAP  121  on  key-dependent  message  security.  The  hardness  reductions  included  in  those  papers 
are  not  sufficient  for  our  puiposes,  as  they  cannot  handle  large  moduli  or  error  terms,  which  is  crucial  in 
our  setting.  We  therefore  provide  an  alternative  reduction  which  is  conceptually  much  simpler,  and  essen¬ 
tially  subsumes  both  previous  reductions.  Our  reduction  works  equally  well  with  exponential  moduli  and 
correspondingly  long  error  vectors,  a  case  earlier  reductions  could  not  handle. 


Broader  perspective.  As  a  byproduct  of  the  proof  of  Theorem  11.11  we  obtain  several  results  that  shed 
new  light  on  the  hardness  of  LWE.  Most  notably,  our  modulus  reduction  result  in  Section  [3]  is  actually  far 
more  general,  and  can  be  used  to  show  a  “modulus  expansion/dimension  reduction”  tradeoff.  Namely,  it 
shows  a  reduction  from  LWE  in  dimension  n  and  modulus  p  to  LWE  in  dimension  n/k  and  modulus  ph 
(see  Corollary  13.41).  Combined  with  our  modulus  reduction,  this  has  the  following  interesting  consequence: 
the  hardness  of  n-dimensional  LWE  with  modulus  q  is  a  function  of  the  quantity  nlog2  q-  In  other  words, 
varying  n  and  q  individually  while  keeping  nlog2  q  fixed  essentially  preserves  the  hardness  of  LWE. 

Although  we  find  this  statement  quite  natural  (since  n  log2  q  represents  the  number  of  bits  in  the  secret), 
it  has  some  surprising  consequences.  One  is  that  n-dimensional  LWE  with  modulus  2n  is  essentially  as 
hard  as  n2 -dimensional  LWE  with  polynomial  modulus.  As  a  result,  n-dimensional  LWE  with  modulus  2n, 
which  was  shown  in  [Pei091  to  be  as  hard  as  n-dimensional  lattice  problems  using  a  classical  reduction,  is 
actually  as  hard  as  n2-dimensional  lattice  problems  using  a  quantum  reduction.  The  latter  is  presumably  a 
much  harder  problem,  requiring  exp(0(n2))  time  to  solve.  This  corollary  highlights  an  inherent  quadratic 
loss  in  the  classical  reduction  of  l! Pei 09 II  (and  as  a  result  also  our  Theorem  11.11)  compared  to  the  quantum 
one  in  [Reg05fl. 

A  second  interesting  consequence  is  that  1-dimensional  LWE  with  modulus  2n  is  essentially  as  hard 
as  n-dimensional  LWE  with  polynomial  modulus.  The  1-dimensional  version  of  LWE  is  closely  related 
to  the  Hidden  Number  Problem  of  Boneh  and  Venkatesan  I BV96II.  It  is  also  essentially  equivalent  to  the 
Ajtai-D work-type  IAD971  cryptosystem  in  [Reg03],  as  follows  from  simple  reductions  similar  to  the  one 
in  the  appendix  of  |Regl0a|.  Moreover,  the  1-dimensional  version  can  be  seen  as  a  special  case  of  the 
Ring-LWE  problem  introduced  in  liLPRIOl  (for  ring  dimension  1,  i.e.,  ring  equal  to  Z).  This  allows  us, 
via  the  ring  switching  technique  from  IGHPS121.  to  obtain  the  first  hardness  proof  of  Ring-LWE,  with 
arbitrary  ring  dimension  and  exponential  modulus,  under  the  hardness  of  problems  on  general  lattices  (as 
opposed  to  just  ideal  lattice  problems).  In  addition,  this  leads  to  the  first  hardness  proof  for  the  Ring-SIS 
problem  HLM061 IPR06II  with  exponential  modulus  under  the  hardness  of  general  lattice  problems,  via  the 
standard  LWE-to-SIS  reduction.  (We  note  that  since  both  results  are  obtained  by  scaling  up  from  a  ring  of 
dimension  1,  the  hardness  does  not  improve  as  the  ring  dimension  increases.) 

A  final  interesting  consequence  of  our  reductions  is  that  (the  decision  form  of)  LWE  is  hard  with  an 
arbitrary  huge  modulus,  e.g.,  a  prime;  see  Corollary  13.31  Previous  results  (e.g.,  |Reg05[  lPei09l  iMMlll 
IMP12B  required  the  modulus  to  be  smooth ,  i.e.,  all  its  prime  divisors  had  to  be  polynomially  bounded. 


Open  questions.  As  mentioned  above,  our  Theorem  11.11  inherits  from  II Pci 09 H  a  quadratic  loss  in  the 
dimension,  which  does  not  exist  in  the  quantum  reduction  [Reg05|  nor  in  the  known  hardness  reductions 
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for  SIS.  At  a  technical  level,  this  quadratic  loss  stems  from  the  fact  that  the  reduction  in  HPei091  is  not 
iterative.  In  contrast,  the  quantum  reduction  in  |Reg05 1  as  well  as  the  reductions  for  SIS  are  iterative,  and  as 
a  result  do  not  incur  the  quadratic  loss.  We  note  that  an  additional  side  effect  of  the  non-iterative  reduction 
is  that  the  hardness  in  Theorem  11.11  and  llPei091  is  based  only  on  the  worst-case  lattice  problem  GapSVP 
(and  the  essentially  equivalent  BDD  and  uSVP  HLMQ9H ).  and  not  on  problems  like  SIVP,  which  the  quantum 
reduction  of  [Reg05  ]  and  the  hardness  reductions  for  SIS  can  handle.  One  case  where  this  is  very  significant 
is  when  dealing  with  ideal  lattices,  as  in  the  hardness  reduction  for  Ring-LWE,  since  GapSVP  turns  out  to 
be  an  easy  problem  there. 

We  therefore  believe  that  it  is  important  to  understand  whether  there  exists  a  classical  reduction  that 
does  not  incur  the  quadratic  loss  inherent  in  llPei09ll  and  in  Theoremll.il  In  other  words,  is  re-dimensional 
LWE  with  polynomial  modulus  classically  as  hard  as  re-dimensional  lattice  problems  (as  opposed  to  ^/re- 
dimensional)?  This  would  constitute  the  first  full  dequantization  of  the  quantum  reduction  in  |RegQ5 1. 

While  it  is  natural  to  conjecture  that  the  answer  to  this  question  is  positive,  a  negative  answer  would  be 
quite  tantalizing.  In  particular,  it  is  conceivable  that  there  exists  a  (classical)  algorithm  for  LWE  with  poly¬ 
nomial  modulus  running  in  time  Due  to  the  quadratic  expansion  in  Theorem  11.11  this  would  not 

lead  to  a  faster  classical  algorithm  for  lattice  problems;  it  would,  however,  lead  to  a  2°'V”)-tinic  quantum 
algorithm  for  lattice  problems  using  the  reduction  in  |Reg()5 1.  The  latter  would  be  a  major  progress  in  quan¬ 
tum  algorithms,  yet  is  not  entirely  unreasonable;  in  fact,  a  2°(^™)-time  quantum  algorithm  for  a  somewhat 
related  quantum  task  was  discovered  by  Kuperberg  [|Kup05 1  (see  also  |Reg02|). 


2  Preliminaries 

Let  T  =  M/Z  denote  the  cycle,  i.e.,  the  additive  group  of  reals  modulo  1.  We  also  denote  by  Tq  its  cyclic 
subgroup  of  order  q,  i.e.,  the  subgroup  given  by  (0, 1  / q, . . . ,  (q  —  1  )/q}. 

For  two  probability  distributions  P,  Q  over  some  discrete  domain,  we  define  their  statistical  distance  as 
Y  P(i)  —  Q{i) |/2  where  i  ranges  over  the  distribution  domain,  and  extend  this  to  continuous  distributions 
in  the  obvious  way.  We  recall  the  following  easy  fact  (see,  e.g.,  IAD87.  Eq.  (2.3)]  for  a  proof). 

Claim  2.1.  If  P  and  Q  are  two  probability  distributions  such  that  P(i)  >  (1  —  e)Q(i)  holds  for  all  i,  then 
the  statistical  distance  between  P  and  Q  is  at  most  e. 

We  will  use  the  following  immediate  corollary  of  the  leftover  hash  lemma  III  III. 1.991. 

Lemma  2.2.  Letk,n,q  >  lbe  integers,  and  e  >  0  be  such  that  re  >  klog2  q  +  21og2(l/e)-  For  H  -t—  T^'xr', 
z  {0,  1 u  •{—  Tg,  the  distributions  of  ( H .  Hz)  and  (H,  u)  are  within  statistical  distance  at  most  e. 

A  distinguishing  problem  P  is  defined  by  two  distributions  Pq  and  P\ ,  and  a  solution  to  the  problem  is 
the  ability  to  distinguish  between  these  distributions.  The  advantage  of  an  algorithm  A  with  binary  output 
on  P  is  defined  as 

Adv[A]  =  |Pr[-4(P0)]  -  Pr[Vl(Pi)]|  . 

A  reduction  from  a  problem  P  to  a  problem  Q  is  an  efficient  (i.e.,  polynomial-time)  algorithm  AP  that 
solves  P  given  access  to  an  oracle  B  that  solves  Q.  Most  of  our  reductions  (in  fact  all  except  the  one  in 
Lemmal2.15l)  are  what  we  call  “transformation  reductions:”  these  reductions  perform  some  transformation 
to  the  input  and  then  apply  the  oracle  to  the  result. 
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2.1  Lattices 


An  n-dimensional  (full-rank)  lattice  A  C  f n  is  the  set  of  all  integer  linear  combinations  of  some  set  of  n 
linearly  independent  basis  vectors  B  =  {bi, . . . ,  bn}  C  Mn, 

A  =  £(B)  =  |  Zjbi  :  z  €ZnJ. 

ie[n] 

The  dual  lattice  of  A  C  Mn  is  defined  as  A*  =  {x  6  K"  :  (A,  x)  C  Z}. 

The  minimum  distance  {or  first  successive  minimum)  Ai(A)  of  a  lattice  A  is  the  length  of  a  shortest 
nonzero  lattice  vector,  i.e.,  Ai(A)  =  minowxfA  ||x||.  For  an  approximation  ratio  7  =  7 (n)  >  1,  the 
GapSVP7  is  the  problem  of  deciding,  given  a  basis  B  of  an  n-dimensional  lattice  A  =  C{ B)  and  a  number 
d,  between  the  case  where  Ai(£(B))  <  d  and  the  case  where  Ai(£(B))  >  7 d.  We  refer  to  [KholOl, 
for  a  recent  account  on  the  computational  complexity  of  GapSVPT 

2.2  Gaussian  measures 

For  r  >  0,  the  n-dimensional  Gaussian  function  pr  :  Mn  — »•  (0, 1]  is  defined  as 

pr(x)  :=  exp(— 7r||x||2/r2). 

We  extend  this  definition  to  sets,  i.e.,  pr(A )  =  y]xf/1  pr(x)  e  [0, +00]  for  any  A  C  Mn.  The  (spherical) 
continuous  Gaussian  distribution  Dr  is  the  distribution  with  density  function  proportional  to  pr.  More 
generally,  for  a  matrix  B,  we  denote  by  D&  the  distribution  of  Bx  where  x  is  sampled  from  D\.  When  B 
is  nonsingular,  its  probability  density  function  is  proportional  to 

exp(— 7rx7  (BBt)_1x). 

A  basic  fact  is  that  for  any  matrices  Bi,  B2,  the  sum  of  a  sample  from  D&1  and  an  independent  sample 
from  Db2  is  distributed  like  Dq  for  C  =  (B^f  +  B2B2  )1^2. 

For  an  n-dimensional  lattice  A  and  a  vector  u  e  Mn,  we  define  the  discrete  Gaussian  distribution  D\+ur 
as  the  discrete  distribution  with  support  on  the  coset  A  +  u  whose  probability  mass  function  is  proportional 
to  pr.  There  exists  an  efficient  procedure  that  samples  within  negligible  statistical  distance  of  any  (not  too 
narrow)  discrete  Gaussian  distribution  (HGPV081  Theorem  4.1];  see  also  UPei  lOft).  In  the  next  lemma,  proved 
in  Section  [5j  we  modify  this  sampler  so  that  the  output  is  distributed  exactly  as  a  discrete  Gaussian.  This 
also  allows  us  to  sample  from  slightly  narrower  Gaussians.  Strictly  speaking,  the  lemma  is  not  needed  for 
our  results,  and  we  could  use  instead  the  original  sampler  from  I1GPV081.  Using  our  exact  sampler  leads 
to  slightly  cleaner  proofs  as  well  as  a  (miniscule)  improvement  in  the  parameters  of  our  reductions,  and  we 
include  it  here  mainly  in  the  hope  that  it  finds  further  applications  in  the  future. 

Lemma  2.3.  There  is  a  probabilistic  polynomial-time  algorithm  that,  given  a  basis  B  of  an  n-dimensional 
lattice  A  =  >C(B),  c  £  Rn,  and  a  parameter  r  >  ||B||  •  ^/ln(2n  +  4)/7t,  outputs  a  sample  distributed 
according  to  /4,\+c.r- 

Here,  B  denotes  the  Gram-Schmidt  orthogonalization  of  B,  and  ||B||  is  the  length  of  the  longest  vector 
in  it.  We  recall  the  definition  of  the  smoothing  parameter  from  IMR041. 

Definition  2.4.  For  a  lattice  A  and  positive  real  e  >  0,  the  smoothing  parameter  iq£{  A)  is  the  smallest  real 
s  >  0  such  that  p\/s{ A*  \  {0})  <  e. 


Reg  Id  I 
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Lemma  2.5  (HGPV081  Lemma  3.1]).  For  any  e  >  0  and  n-dimensional  lattice  A  with  basis  B, 


Ve{ A)  <  II B ||  v/ln(2n(l  +  1/s))/tt. 


We  now  collect  some  known  facts  on  Gaussian  distributions  and  lattices. 

Lemma  2.6  (  IMR04,  Lemma  4.1]).  For  any  n-dimensional  lattice  A,  e  >  0 ,  r  >  t]E  (A),  the  distribution  of 
x  mod  A  where  x  G-  I)r  is  within  statistical  distance  e/2  of  the  uniform  distribution  on  cosets  of  A. 


Lemma  2.7  ((Reg05[  Claim  3.8]).  For  any  n-dimensional  lattice  A,  e  >  0,  r  >  r/£  (A),  and  c  G 
have  pr( A  +  c)  G  [|=§,  1]  •  pr( A). 


we 


Lemma  2.8  ([RcgOS,  Claim  3.9]).  Let  A  be  an  n-dimensional  lattice,  let  u  G  Rn  be  arbitrary,  let  r,  s  >  0 
and  let  t  =  y/r2  +  s2.  Assume  that  rs/t  =  1/ \Jl/r2  +  1/s2  >  r/e(A)  for  some  e  <  1/2.  Consider  the 
continuous  distribution  Y  on  Rra  obtained  by  sampling  from  D\+u  r  and  then  adding  a  noise  vector  taken 
from  Ds.  Then,  the  statistical  distance  between  Y  and  Dt  is  at  most  4s. 


Lemma  2.9  (|Reg05  ,  Corollary  3.10]).  Let  A  be  an  n-dimensional  lattice,  let  u,  z  G  Rn  be  arbitrary,  and 
let  r,a  >  0.  Assume  that  (1/r2  +  (||z||/ck)2)- 1//2  >  r\e(A)  for  some  £  <  1/2.  Then  the  distribution  of 
(z,  v)  +  e  where  v  G-  D\+U.r  and  e  G-  Da,  is  within  statistical  distance  4e  of  Dp  for  f3  =  \J  (r||z||)2  +  a2. 

Lemma  2.10  (Special  case  of  IPeilOl  Theorem  3.1]).  Let  A  be  a  lattice  and  r,  s  >  0  be  such  that  s  > 
rjs(A)  for  some  s  <  1/2.  Then  if  we  choose  x  from  the  continuous  Gaussian  I),  and  then  choose  y  from  the 
discrete  Gaussian  Da-XiS  then  x  +  y  is  within  statistical  distance  8  c  of  the  discrete  Gaussian  (r2+s2)i/2. 


2.3  Learning  with  Errors 

For  integers  n,q  >  1,  an  integer  vector  s  G  IF,  and  a  probability  distribution  <i>  on  R,  let  A,hfiJlJ  be  the 
distribution  over  TJxT  obtained  by  choosing  a  G  T/  uniformly  at  random  and  an  error  term  e  from  f,  and 
outputting  the  pair  (a,  b  =  (a,  s)  +  e)  G  T”  x  T. 

Definition  2.11.  For  integers  n,q  >  1,  an  error  distribution  <p  over  R,  and  a  distribution  'D  over  IF, 
the  (average-case)  decision  variant  of  the  LWE  problem,  denoted  LWE„  ,/.0('D),  is  to  distinguish  given 
arbitrarily  many  independent  samples,  the  uniform  distribution  over  T”  x  T  from  Aqs^for  a  fixed  s  sampled 
from  V.  The  variant  where  the  algorithm  only  gets  a  bounded  number  of  samples  rn  G  N  is  denoted 
L\NEntm,q,<f>('D). 

Notice  that  the  distribution  Aq  s^  only  depends  on  s  mod  q,  and  so  one  can  assume  without  loss  of 
generality  that  s  G  {0, . . .  ,  q  —  l}n.  Moreover,  using  a  standard  random  self-reduction,  for  any  distribution 
over  secrets  V,  one  can  reduce  LWE^^P)  to  LWErtj(? ^((7({0, . . .  ,q  —  l}n)),  and  we  will  occasionally 
use  LWE U)q^  to  denote  the  latter  (as  is  common  in  previous  work).  When  the  noise  is  a  Gaussian  with 
parameter  a  >  0,  i.e.,  f  =  I) a ,  we  use  the  shorthand  LWEni9iQ,(P).  Since  the  case  when  V  is  uniform 
over  (0,  l}n  plays  an  important  role  in  this  paper,  we  will  denote  it  by  binLWE,,  ,^  (and  by  binLWE„  ,rt  f;  p; 
when  the  algorithm  only  gets  m  samples).  Finally,  as  we  show  in  the  following  lemma,  one  can  efficiently 
reduce  LWE  to  the  case  in  which  the  secret  is  distributed  according  to  the  (discretized)  error  distribution 
and  is  hence  somewhat  short.  This  latter  form  of  LWE,  known  as  the  “normal  form,”  was  first  shown  hard 
in  I ACPS091  for  the  case  of  prime  q.  Here  we  observe  that  the  proof  extends  to  non-prime  q,  the  new 
technical  ingredient  being  Claim  l2d~3l below. 
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Lemma  2.12.  For  any  q  >  25,  n,m  >  1,  a  >  0,  e  <  1/2  and  s  >  \/ln(2n(l  +  l/e)/7r)/g,  f/mre  ;.v  on 
efficient  ( transformation )  reduction  from  LWEni?Tti,jiQ  to  LWEnm/gQ(D)  where  m!  =m—  (16n  +  41nlng) 
and  T>  =  9(Q2+s2)i/2,  f/raf  turns  advantage  (  into  an  advantage  of  at  least  (£  —  8e)/4.  In  particular, 

assuming  a  >  y/ln(2?z(l  +  l/e)/-7r) / q,  we  can  take  s  =  a,  in  which  case  T>  =  ^  . 

Proof  Consider  the  first  16n  +  4  In  In  q  samples  (a,  b).  Using  Claim  12. 1 31  with  probability  at  least  1  — 
2e_1  >  1/4,  we  can  efficiently  find  a  subsequence  of  the  samples  such  that  the  matrix  Aq  €  Z”xn  whose 
columns  are  formed  by  the  a  in  the  subset  (scaled  up  by  q)  has  an  inverse  Aq  1  £  Z“  modulo  q.  If  we 
cannot  find  such  a  subsequence,  we  abort.  Let  bo  G  T"  be  the  vector  formed  by  the  corresponding  b  in  the 
subsequence.  Let  also  b'0  G  T''  be  bo  +  x  where  x  is  chosen  from  Dq- iz*»_b0,s-  (Notice  that  the  coset 
q_1Zn  —  bo  is  well  defined  because  bo  is  a  coset  of  7Ln  C  q~  1 Zn.)  From  each  of  the  remaining  ml  samples 
(a,  b)  G  Tg  x  T  we  produce  a  pair 

(a'  =  Ao  1a,  b'  =  b-  (Af1  ■  qa,  b'»  G  T”  x  T. 

We  then  apply  the  given  LWE  oracle  to  the  resulting  m'  pairs  and  output  its  result. 

We  now  analyze  the  reduction.  First  notice  that  the  construction  of  Aq  depends  only  on  the  a  component 
of  the  input  samples,  and  hence  the  probability  of  finding  it  is  the  same  in  case  the  input  is  uniform  and  in 
case  it  consists  of  LWE  samples.  It  therefore  suffices  in  the  following  to  show  that  there  is  a  distinguishing 
gap  conditioned  on  successfully  finding  an  Aq.  To  that  end,  first  observe  that  if  the  input  samples  (a,  b)  are 
uniform  in  T”  x  T  then  so  are  the  output  samples  (a',  b').  Next  consider  the  case  that  the  input  samples 
are  distributed  according  to  Aqsj)n  for  some  s  G  Zn.  Then  since  s  >  rp  (jf  1 Z)  by  Lemma  [231  using 
Lemma  12.101  we  get  that  b'0  =  q  1  A/s  +  eo  where  eo  is  distributed  within  statistical  distance  8c  from 
I)ir] 7An  (Q2+s2)i/2.  Therefore,  for  each  output  sample  (a',  b ')  we  have 

b'  =  b-  {Afl  ■  qa,  b'0)  =  (a,  s)  +  e  -  (a,  s)  -  (A^qa,  e0)  =  { -qe0 ,  a'}  +  e, 

where  e  is  an  independent  error  from  Da.  Therefore,  the  output  samples  are  distributed  according  to 
A-q  -qe0,Da,  completing  the  proof.  □ 

Claim  2.13.  For  any  q  >  25,  n  >  1,  and  t  \  >4,^2  >  1,  given  a  sequence  of  t\n  +  £2  hi  In  g  vectors 
ai,  a2, . . .  chosen  uniformly  and  independently  from  Z”,  except  with  probability  g-Fn/i6  _|_  g-fe/4,  qiere 
exists  a  subsequence  of  11  vectors  such  that  the  n  x  n  matrix  they  form  is  invertible  modulo  q.  Moreover, 
such  a  subsequence  can  be  found  efficiently. 

Proof  We  consider  the  following  procedure.  Let  k  be  a  counter,  initialized  to  0,  indicating  the  number 
of  vectors  currently  in  the  subsequence,  and  let  A  G  Z”xfc  be  the  matrix  whose  columns  arc  formed  by 
the  current  subsequence.  We  also  maintain  a  unimodular  matrix  U  G  Znxn,  initially  set  to  the  identity, 
satisfying  the  invariant  that  U  ■  A  G  X™xk  has  the  following  form:  its  top  kx  k  submatrix  is  upper  triangular 
with  each  diagonal  coefficient  coprime  with  q\  its  bottom  (n  —  k)  x  k  submatrix  is  zero.  The  procedure 
considers  the  vectors  a*  one  by  one.  For  each  vector  a,  if  it  is  such  that  the  gcd  of  the  last  n  —  k  entries 
of  Ua,  call  it  g,  is  coprime  with  q.  then  it  does  the  following:  it  adds  a  to  the  subsequence,  computes  (using, 
say,  the  extended  GCD  algorithm)  a  unimodular  matrix  V  that  acts  as  identity  on  the  first  k  coordinates  and 
for  which  the  last  n  —  k  coordinates  of  VUa  are  (<7, 0, ...  ,  0),  replaces  U  with  VU,  and  increments  k. 

It  is  easy  to  see  that  the  procedure’s  output  is  correct  if  it  reaches  k  =  n.  It  therefore  suffices  to  analyze 
the  probability  that  this  event  happens.  For  this  we  use  the  following  two  facts  to  handle  the  cases  k  <  n  —  1 
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and  k  =  n  —  1,  respectively.  First,  the  probability  that  the  gcd  of  two  uniformly  random  numbers  modulo  q 
is  coprime  with  q  is 

n  (i-p-2)>  n  (i-f_2)=c(2)"i~°-61, 

p\q,  p  prime  p  prime 

where  (  is  the  Riemann  zeta  function.  Second,  the  probability  that  one  uniformly  random  number  modulo  q 
is  coprime  with  q  is  ip(q)/q,  where  ip  is  Euler’s  totient  function.  By  |BS96l  Theorem  8.8.7],  this  probability 
is  at  least  (e7  In  In  q  +  3/ (In  In  q))~x  where  7  is  Euler’s  constant,  which  for  q  >  25  is  at  least  (4  In  In  q )_1. 

Using  the  (multiplicative)  Chernoff  bound,  the  first  fact,  and  the  fact  that  Ua  is  uniform  in  Z”  since  U 
is  unimodular,  we  see  that  the  probability  that  k  <  n  —  1  after  considering  tin  vector  is  at  most  e_tl”/16. 
Moreover,  once  k  =  n  —  1,  using  the  second  fact  we  get  that  the  probability  that  after  considering  1 2  In  In  q 
additional  vectors  we  still  have  k  =  n  —  1  is  at  most  e~,-2/4.  □ 


Unknown  (Bounded)  Noise  Rate.  We  also  consider  a  variant  of  LWE  in  which  the  amount  of  noise  is 
some  unknown  j3  <  a  (as  opposed  to  exactly  a),  with  /?  possibly  depending  on  the  secret  s.  As  the 
following  lemma  shows,  this  does  not  make  the  problem  significantly  harder. 

Definition  2.14.  For  integers  n,q  >  1  and  a  €  (0, 1),  LWE„  f/  <„  is  the  problem  of  solving  LWEf,  ,;  ^  for 
any  (3  =  /3(s)  <  a. 

Lemma  2.15.  Let  A  be  an  algorithm  for  LWEn/m  f/  (l  with  advantage  at  least  e  >  0.  Then  there  exists  an 
algorithm  Bfor  LWEnm' ,g,<a  using  oracle  access  to  A  and  with  advantage  at  least  1  /3,  where  both  m!  and 
its  running  time  are  poly(?7i,  1/e,  n,  log  q). 

The  proof  is  standard  (see,  e.g.,  [Reg05[  Lemma  3.7]  for  the  analogous  statement  for  the  search  version 
of  LWE).  The  idea  is  to  use  Chernoff  bound  to  estimate  A’s  success  probability  on  the  uniform  distribution, 
and  then  add  noise  in  small  increments  to  our  given  distribution  and  estimate  Zl's  behavior  on  the  resulting 
distributions.  If  there  is  a  gap  between  any  of  these  and  the  uniform  behavior,  the  input  distribution  is 
deemed  non-uniform.  The  full  proof  is  omitted. 


Relation  to  Lattice  Problems.  Regev  [Reg05 1  and  Peikert  flPei09l  showed  quantum  and  classical  reduc¬ 
tions  (respectively)  from  the  worst-case  hardness  of  the  GapSVP  problem  to  the  search  version  of  LWE. 
(We  note  that  the  quantum  reduction  in  ||Reg05 1  also  shows  a  reduction  from  SIVP.)  As  mentioned  in  the 
introduction,  the  classical  reduction  only  works  when  the  modulus  q  is  exponential  in  the  dimension  n.  This 
is  summarized  in  the  following  theorem,  which  is  derived  from  |Reg05[  Theorem  3.1]  and  l|Pei09l  Theo¬ 
rem  3.1]. 


Theorem  2.16.  Let  n,q  >  1  be  integers  and  let  a  £  (0, 1)  be  such  that  aq  >  2 \fn.  Then  there  exists  a 
quantum  reduction  from  worst-case  n-dimensional  GapSVP^^y^  to  LWE,,  If  in  addition  q  >  2n'/2 
then  there  is  also  a  classical  reduction  between  those  problems. 


In  order  to  obtain  hardness  of  the  decision  version  of  LWE,  which  is  the  one  we  consider  throughout 
the  paper,  one  employs  a  search-to-decision  reduction.  Several  such  reductions  appear  in  the  literature 
(e.g.,  [ Reg05 [  lPei09 1  [MPl2l ) .  The  most  recent  reduction  by  Micciancio  and  Peikert  BMP  121.  which  essen¬ 
tially  subsumes  all  previous  reductions,  requires  the  modulus  q  to  be  smooth.  Below  we  give  the  special 
case  when  the  modulus  is  a  power  of  2,  which  suffices  for  our  purposes.  It  follows  from  our  results  that  (de¬ 
cision)  LWE  is  hard  not  just  for  a  smooth  modulus  q,  as  follows  from  IMP  121.  but  actually  for  all  moduli  q, 
including  prime  moduli,  with  only  a  small  deterioration  in  the  noise  (see  Corollary  13.31). 
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Theorem  2.17  (Special  case  of  I1MP121  Theorem  3.1]).  Let  q  be  a  power  of  2,  and  a  satisfy  1  /q  <  a  < 

l/u{y/\og  n).  Then  there  exists  an  efficient  reduction  from  search  LWE„i(?jQ,  to  (decision)  LWEr)(;a/  for 
a'  =  a  ■  a;(logn). 

3  Modulus-Dimension  Switching 

The  main  results  of  this  section  are  Corollaries  13.21  and  13.41  below.  Both  are  special  cases  of  the  following 
technical  theorem.  We  say  that  a  distribution  V  over  Z"  is  (B,  <5)-bounded  for  some  reals  B.d  >  0  if  the 
probability  that  x  <r-  V  has  norm  greater  than  B  is  at  most  6. 

Theorem  3.1.  Let  m,  n,  n' ,  q.  q'  >  1  he  integers,  let  G  E  Z"7  x "  be  such  that  the  lattice  A  =  ^7G'/1Zr,,  +Z" 
has  a  known  basis  B,  and  let  T>  be  an  arbitrary  (B .  5)-bounded  distribution  over  Zn.  Let  a,  6  >  0  and  e  E 
(0, 1/2)  satisfy 

(32  >  a2  +  (4/-7t)  ln(2n(l  +  1/e))  •  (max{q_1,  ||B||}  •  B)2 . 

Then  there  is  an  efficient  ( transformation )  reduction  from  LWEra  mj?i<Q,(21)  to  LWEn/imjg/i<^(G  •  T>)  that 
reduces  the  advantage  by  at  most  6  +  Mem. 

Here  we  use  the  notation  ||B||  from  Lemma [231  We  also  note  that  if  needed,  the  distribution  on  secrets 
produced  by  the  reduction  can  always  be  turned  into  the  uniform  distribution  on  Z/, ,  as  mentioned  after 
Definition  12.111  Also,  we  recall  that  there  exists  an  elementary  reduction  from  LWEn/ ,qq<p  to  LWE„/,;/.g 
(see  Lemma [2. 151). 

Here  we  state  two  important  corollaries  of  the  theorem.  The  first  corresponds  to  just  modulus  reduction 
(the  LWE  dimension  is  preserved),  and  is  obtained  by  letting  n'  =  n,  G  =  I  be  the  n-dimensional  identity 
matrix,  and  B  =  I/q'.  For  example,  we  can  take  q  >  q'  >  \j2\n.(2n(l  +  1/e))  •  ( B/a )  and  (3  =  \[2 a, 
which  corresponds  to  reducing  an  arbitrary  modulus  to  almost  B/a,  while  increasing  the  initial  error  rate  a 
by  just  a  small  constant  factor. 

Corollary  3.2.  For  any  m,  n  >  1,  q  >  q'  >  1,  ( B.  5)-bounded  distribution  T>  over  Zn,  a.  6  >  0  and  e  E 
(0, 1/2)  such  that 

(32  >  a2  +  (4/v r)  ln(2n(l  +  1/e))  •  ( B/q ')2, 

there  is  an  efficient  reduction  from  LWETtjmjqj<Q(D)  to  LWE n,rn,q',</3('D)  that  reduces  the  advantage  by  at 
most  5  +  Mem. 

In  particular-,  by  using  the  normal  form  of  LWE  (Lemma[2.12l).  in  which  the  secret  has  distribution  V  = 
I)7/,  y7->(iii,  we  can  switch  to  a  power-of-2  modulus  with  only  a  small  loss  in  the  noise  rate,  as  described  in  the 
following  corollary.  Together  with  the  known  search-to-decision  reduction  (Theorem  12 .171),  this  extends  the 
known  hardness  of  (decision)  LWE  to  any  modulus  q.  Here  we  use  that  V  =  I)j/,  r  is  (Cr^Jn  log (n/5),  5)- 
bounded  for  some  universal  constant  C  >  0,  which  follows  by  taking  union  bound  over  the  n  coordinates. 
(Alternatively,  one  could  use  that  it  is  ( r^fn ,  2_n)-bounded,  as  follows  from  IBan93l  Lemma  1.5],  leading 
to  a  slightly  tighter  statement  for  large  n.) 

Corollary  3.3.  Let  5  E  (0, 1/2),  m  >  n  >  1,  q'  >  25.  Let  also  q  E  [q',  2 q')  be  the  smallest  power  of  2 
not  smaller  than  q'  and  a  >  y/ln(2n(l  +  16/5)/tr)/q.  There  exists  an  efficient  ( transformation )  reduction 
from  LWEn]mj(?]0,  to  LWEn^/^/^^  where  rn!  =  m  —  (16n  +  4  In  In  q)  and 

/3  =  C  ay/ny/log(n/ 5)  log  (m/5) 

for  some  universal  constant  C  >  0,  that  turns  advantage  off  into  an  advantage  of  at  least  (£  —  <5)/4. 
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Another  corollary  illustrates  a  modulus-dimension  tradeoff.  Assume  n  =  kn'  for  some  k  >  1,  and 
let  q'  =  qk.  Let  G  =  In>  <g)  g,  where  g  =  (1,  q,  q2, . . .  ,  qk~l)T  G  Zfc.  We  then  have  A  =  q~kGTZn '  +  Zn. 
A  basis  of  A  is  given  by 


B  =  Ini  <g) 


,i-fc 


,-i 


this  is  since  the  column  vectors  of  B  belong  to  A  and  the  determinants  match.  Orthogonalizing  from  left 
to  right,  we  have  B  =  q  1 1  and  so  ||B||  =  q~ 1 .  We  therefore  obtain  the  following  corollary,  showing 
that  we  can  trade  off  the  dimension  against  the  modulus,  holding  n  log  q  =  n'  log  q'  fixed.  For  example, 
letting  T>  =  Dzn  aq  (corresponding  to  a  secret  in  normal  form,  see  Lemma  12.121),  which  is  (aqfan,  2~n)~ 
bounded,  the  reduction  increases  the  error  rate  by  about  a  fan  factor. 


Corollary  3.4.  For  any  n,m,q  >  1,  k  >  1  that  divides  n,  (B.  5)-bounded  distribution  T)  over  Zn,  a,  8  >  0, 
and e  €  (0,1 /2)  such  that 


/32  >a2  +  (4/7 r)  ln(2n(l  +  1/e))  •  ( B/q )2, 

there  is  an  efficient  reduction  from  LWEriimi,jj<Q(P)  to  l\NEn^k  m  qkt<0(G  ■  V)  that  reduces  the  advantage 
by  at  most  5  +  14em,  where  G  =  ln/k  <8>  (1,  q,  q2,  ■  ■  ■ ,  qk~l)T ■ 

TheoremB.  1  Hollows  immediately  from  the  following  lemma. 

Lemma  3.5.  Adopt  the  notation  of  Theorem  13.11  and  let 

r  >  max{(7-1,  || B|| }  •  \j2\n(2n{l  +  l/e))/7r. 

There  is  an  efficient  mapping  from  Tq  x  T  to  X  T,  which  has  the  following  properties: 

•  If  the  input  is  uniformly  random,  then  the  output  is  within  statistical  distance  4e  from  the  uniform 
distribution. 

•  If  the  input  is  distributed  according  to  Aq  s.or,  far  some  s  G  Z"  with  ||s||  <  B,  then  the  output 

distribution  is  within  statistical  distance  lOe  from  q s,d  ,<  where  (fa)2  =  a2  +  r2(||s||2  +  B2)  < 

a 2  +  2  (rB)2. 

Proof  The  main  idea  behind  the  reduction  is  to  encode  T”  into  T”/,  so  that  the  mod-1  inner  products 
between  vectors  in  T”  and  a  short  vector  s  S  Zn,  and  between  vectors  in  T”/  and  Gs  €  Zn\  are  nearly 
equivalent.  In  a  bit  more  detail,  the  reduction  will  map  its  input  vector  a  G  Tq  (from  the  given  LWE-or- 
uniform  distribution)  to  a  vector  a'  G  ¥/(,  so  that 

(a',  Gs)  =  (GIa',s)  «  (a,  s)  mod  1 

for  any  (unknown)  s  G  Zn.  To  do  this,  it  randomly  samples  a'  so  that  G7  a7  ~  a  mod  Zn,  where  the 
approximation  error  will  be  a  discrete  Gaussian  of  parameter  r. 

We  can  now  formally  define  the  reduction,  which  works  as  follows.  On  an  input  pair  (a,  b )  G  T”  x  T,  it 
does  the  following: 
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•  Choose  f  G-  Z?A-a,r  using  Lemma [2731  with  basis  B,  and  let  v  =  a  +  f  £  A/Z".  (The  coset  A  —  a 
is  well  defined  since  a  =  a  +  Zn  is  some  coset  of  Zn  C  A.)  Choose  a  uniformly  random  solution 
a'  G  T”,  to  the  equation  GTa'  =  v  mod  Zn.  This  can  be  done  by  computing  a  basis  of  the  solution 
set  Gra'  =  0  mod  Z",  and  adding  a  uniform  element  from  that  set  to  an  arbitrary  solution  to  the 
equation  GTa'  =  v  mod  Zn. 

•  Choose  d  G-  Drs  and  let  b'  =  b  +  d  €  T. 

•  Output  (a',  b'). 

We  now  analyze  the  reduction.  First,  if  the  distribution  of  the  input  is  uniform,  then  it  suffices  to  show 
that  a'  is  (nearly)  uniformly  random,  because  both  b  and  d  are  independent  of  a',  and  6  G  T  is  uniform. 
To  prove  this  claim,  notice  that  it  suffices  to  show  that  the  coset  v  G  A/Zn  is  (nearly)  uniformly  random, 
because  each  v  has  the  same  number  of  solutions  a'  to  Gra'  =  v  mod  Zr\  Next,  observe  that  for  any 
a  G  Tq  and  f  G  A  —  a,  we  have  by  Lemma [2771  (using  that  r  >  rj£( A)  by  Lemma [231)  that 

Pr[a  =  a  A  f  =  f]  =  q~n  ■  pr(f)/pr( A  —  a) 

eC[l,^\-Pr(f).  (3.1) 

where  C  =  q~n/pr(A)  is  a  normalizing  value  that  does  not  depend  on  a  or  f.  Therefore,  by  summing  over 
all  a,  f  satisfying  a  +  f  =  v,  we  obtain  that  for  any  v  G  A/Z", 

Pr[v  =  v]  G  C[  1,  ±±§]  •  Priq-'Z"  +  v). 

Since  r  >  rje{q~lrLn)  (by  Lemma[231).  Lem  malATl  implies  that  Pr[v  =  v]  G  []-p|,  prf]  C’  for  a  constant  C’ 
that  is  independent  of  v.  By  Claiml2. 1  [  this  shows  that  a'  is  within  statistical  distance  1  —  ((1— e)/(l+e))2  < 
4e  of  the  uniform  distribution. 

It  remains  to  show  that  the  reduction  maps  Aqsoa  to  Aqt ,gs,dp-  Let  the  input  sample  from  the  former 
distribution  be  (a,  b  =  (a,  s)  +  e),  where  e  •(—  Da.  As  argued  above,  the  output  a'  is  (nearly)  uniform 
over  T”, .  So  condition  now  on  any  fixed  value  a'  G  T”,;  of  a',  and  let  v  =  GTa'  mod  Zn.  We  have 

b1  =  (a,  s)  +  e  +  e1  =  (a',  Gs)  +  e  +  (— f ,  s)  +  e1  mod  1. 

By  Claim  12.11  and  (13.11)  (and  noting  that  if  f  =  f  then  a  =  v  —  f  mod  Zn),  the  distribution  of  — f  is 
within  statistical  distance  1  —  (1  —  e)/(l  +  e)  <  2e  of  Dq~ izn_v,r-  By  LemmalT9l(using  r  >  V^Vsiq  1%n) 
and  ||s||  <  B),  the  distribution  of  (— f ,  s)  +  e1  is  within  statistical  distance  6e  from  Dt,  where  t2  =  r2(||s||2  + 
B2).  It  therefore  follows  that  e+  (— f,  s)  +  e'  is  within  statistical  distance  6e  from  D^t 2+a2\i/2,  as  required.D 

4  Hardness  of  LWE  with  Binary  Secret 

The  following  is  the  main  theorem  of  this  section. 

Theorem  4.1.  Let  k,q  >  1,  and  rn  >  n  >  1  be  integers,  and  let  e  G  (0, 1/2),  a,  6  >0,  be  such  that  n  > 
( k  +  1)  log2  q  +  21og2(l/(5),  a  >  \J\w(2n{l  +  l/e))/7r  jq.  There  exist  three  ( transformation )  reductions 
from  LWEfc,m)g)Q  to  binLWE(;  m  q  such  that  for  any  algorithm  for  the  latter  problem  with  advantage 

Q,  at  least  one  of  the  reductions  produces  an  algorithm  for  the  former  problem  with  advantage  at  least 

(C  -  8) /{3m)  -  41e/2  -  ^  p~k~l .  (4.1) 

p\q,  p  prime 
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By  combining  Theorem  14.11  with  the  reduction  in  Corollary  13.21  (and  noting  that  {0,  l}n  is  (y/n,  0) 
bounded),  we  can  replace  the  binLWE  problem  above  with  binLWE„  m  qi  p  for  any  </  >  1  and  £  >  0  where 

/  4  n  \  V2 

P  :=  i  lOncr2  +  — j  ln(2n(l  +  1/0))  , 

while  decreasing  the  advantage  in  (14.11)  by  14 £m.  Recalling  that  LWE  of  dimension  k  =  yfn  and  modulus 
q  =  2k'2  (assume  k  is  even)  is  known  to  be  classically  as  hard  as  x/n -dimensional  lattice  problems  (Thco- 
rems  12.161  and  12. 17b.  this  gives  a  formal  statement  of  Theorem  11.11  The  modulus  q'  can  be  taken  almost  as 
small  as  yfn. 

For  most  purposes  the  sum  over  prime  factors  of  q  in  (14.11)  is  negligible.  For  instance,  in  deriving 
the  formal  statement  of  Theorem  11.11  above,  we  used  a  q  that  is  a  power  of  2,  in  which  case  the  sum  is 
2_fc_1  =  which  is  negligible.  If  needed,  one  can  improve  this  by  applying  the  modulus  switching 

reduction  (Corollary  13.31)  before  applying  Theorem  14. 1 1  in  order  to  make  q  prime.  (Strictly  speaking,  one 
also  needs  to  apply  Lemmal2.15lto  replace  the  “unknown  noise”  valiant  of  LWE  given  by  Corollary  13.31  with 
the  fixed  noise  variant.)  This  improves  the  advantage  loss  to  1  which  is  roughly  2~n. 

In  a  high  level,  the  proof  of  the  theorem  follows  by  combining  three  main  steps.  The  first,  given  in  Sec¬ 
tion  14. 1 1  reduces  LWE  to  a  variant  in  which  the  first  equation  is  errorless.  The  second,  given  in  Section  l4~2l 
reduces  the  latter  to  the  intermediate  problem  extLWE,  another  valiant  of  LWE  in  which  some  information 
on  the  noise  elements  is  leaked.  Finally,  in  Section l4~3l  we  reduce  extLWE  to  LWE  with  {0, 1}  secret.  We 
note  that  the  first  reduction  is  relatively  standard;  it  is  the  other  two  that  we  consider  as  the  main  contribution 
of  this  section.  We  now  proceed  with  more  details  (see  also  Figure [T]). 

Proof.  First,  since  m  >  n,  Lemma  14.31  provides  a  transformation  reduction  from  LWE/iVm,M,  to  first-is- 
errorless  LWE/,+ir,,/0,  while  reducing  the  advantage  by  at  most  2~k+1.  Next,  Lemma  1431  with  Z  =  {0, 1}T\ 
which  is  of  quality  £  =  2  by  Claim l4~6l  reduces  the  latter  problem  to  extLWEfc+1  n  q  |0  j  ,n  while  reduc¬ 
ing  the  advantage  by  at  most  33e/2.  Then,  Lc m m al4.8lrcduccs  the  latter  problem  to  extLWE™+]  ^  ^  ^n, 

while  losing  afactor  of  m  in  the  advantage.  Finally,  Lc m m a  14.91  p ro v ides  three  reductions  to  binLWEn  m  <y/i^pa 
two  from  the  latter  problem,  and  one  from  LWEfc+1  m  y/^a,  guaranteeing  that  the  sum  of  advantages  is 
at  least  the  original  advantage  minus  4 me  +  5.  Together  with  the  trivial  reduction  from  LWE/,.m^„  to 
LWE/:  |  |  m  (which  incurs  no  loss  in  advantage),  this  completes  the  proof.  □ 

4.1  First-is-errorless  LWE 

We  first  define  a  variant  of  LWE  in  which  the  first  equation  is  given  without  error,  and  then  show  in 
Lemma  1431  that  it  is  still  hard. 

Definition  4.2.  For  integers  n,  q  >  1  and  an  error  distribution  <p  over  R,  the  “first-is-errorless”  variant 
of  the  LWE  problem  is  to  distinguish  between  the  following  two  scenarios.  In  the  first,  the  first  sample  is 
uniform  over  T”  X  T?  and  the  rest  are  uniform  over  T™  X  T.  In  the  second,  there  is  an  unknown  uniformly 
distributed  s  €  {0, . . .  ,  q  —  1 }",  the  first  sample  we  get  is  from  ,4(/  s  {0i  (where  {0}  denotes  the  distribution 
that  is  deterministically  zero)  and  the  rest  are  from  _4(;  S  0. 

Lemma  4.3.  For  any  n  >  2,  m,  q  >  l,  and  error  distribution  0,  there  is  an  efficient  (transformation) 
reduction  from  LWE,,  _  |  0  to  the  first-is-errorless  variant  of  \0AJEn  m  q  0  that  reduces  the  advantage  by 

at  most  YlPP~n’  with  ^ie  sum  S°^nS  over  prime  factors  ofq. 
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Lemma  14.71 


Lemma 


Figure  1:  Summary  of  reductions  used  in  Theorem  14.  II 


Notice  that  if  q  is  prime  the  loss  in  advantage  is  at  most  q  n.  Alternatively,  for  any  number  q  we  can  bound 
it  by 

poo 

^k~n  <  2~n  +  /  t~ndt<2~n+2, 

k>  2 

which  might  be  good  enough  when  n  is  large. 

Proof.  The  reduction  starts  by  choosing  a  vector  a'  uniformly  at  random  from  {0, . . . ,  <7  —  \}n.  Let  r  be  the 
greatest  common  divisor  of  the  coordinates  of  a'.  If  it  is  not  coprime  to  q,  we  abort.  The  probability  that 
this  happens  is  at  most 

E  rn. 

p  prime,  p\q 

Assuming  we  do  not  abort,  we  proceed  by  finding  a  matrix  U  G  Pxn  that  is  invertible  modulo  q  and 
whose  leftmost  column  is  a'.  Such  a  matrix  exists,  and  can  be  found  efficiently.  For  instance,  using  the 
extended  GCD  algorithm,  we  find  an  n  x  n  unimodular  matrix  R  such  that  Ra'  =  (r,  0, . . .  ,  0) 7  .  Then 
R-1  •  diag(r,  1, . . . ,  1)  is  the  desired  matrix.  We  also  pick  a  uniform  element  so  G  {0,  —  1}.  The 

reduction  now  proceeds  as  follows.  The  first  sample  it  outputs  is  (a '/q,  so/q).  The  remaining  samples  are 
produced  by  taking  a  sample  (a,  b)  from  the  given  oracle,  picking  a  fresh  uniformly  random  d  €  Tq,  and 
outputting  (U(d|a),  b  +  (sq  •  d))  with  the  vertical  bar  denoting  concatenation.  It  is  easy  to  verify  correct¬ 
ness:  given  uniform  samples,  the  reduction  outputs  uniform  samples  (with  the  first  sample’s  b  component 
uniform  over  T9),  up  to  statistical  distance  2~n+1;  and  given  samples  from  Aq,s^,  the  reduction  outputs 
one  sample  from  -A9)S/t{0}  and  the  remaining  samples  from  Aq >s/^,  up  to  statistical  distance  2~n+l,  where 
s'  =  (U“1)T(so|s)  mod  q.  This  proves  correctness  since  U,  being  invertible  modulo  q,  induces  a  bijection 
on  Z”,  and  so  s'  is  uniform  in  {0, . . . ,  q  —  l}n.  □ 

4.2  Extended  LWE 

We  next  define  the  intermediate  problem  extLWE.  (This  definition  is  of  an  easier  problem  than  the  one 
considered  in  previous  work  IAP12H.  which  makes  our  hardness  result  stronger.) 
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Definition  4.4.  For  n,  rn ,  q,  t  >  1,  Z  C  Zm,  and  a  distribution  \  over  |Zm,  //?<?  extLWE^  m  q  x  Z  problem 
is  as  follows.  The  algorithm  gets  to  choose  z  G  Z  and  then  receives  a  tuple 


(A,  (bj)ig[t],  ((e*,  z))ig[t])  €  TqXm 


x  (T™)*  x  (Jz )*. 


//.v  goal  is  to  distinguish  between  the  following  two  cases.  In  the  first,  A  €  T”xm  /y  chosen  uniformly, 
e*  €  ^Zm  are  chosen  from  %,  a/id  b,  =  A7  Sj  +  e,;  mod  1  where  s  /  G  {0, . . .  ,  q  —  l}n  are  chosen  uniformly. 
The  second  case  is  identical,  except  that  the  b,;  are  chosen  uniformly  in  T'f  independently  of  everything  else. 


When  t  =  1,  we  omit  the  superscript  t.  Also,  when  \  is  Dq-iima  for  some  a  >  0,  we  replace  the 
subscript  %  by  a.  We  note  that  a  discrete  version  of  LWE  can  be  defined  as  a  special  case  of  extLWE  by 
setting  Z  =  {0m}.  We  next  define  a  measure  of  quality  of  sets  Z. 

Definition  4.5.  For  a  real  k  >  0  and  a  set  Z  C  Zm  we  say  that  Z  is  of  quality  £  if  given  any  z£2,  we  can 
efficiently  find  a  unimodular  matrix  U  €  Zmxm  such  that  if  U'  G  Zmx(m"i)  is  the  matrix  obtained  from  U 
by  removing  its  leftmost  column  then  all  of  the  columns  of  U7  are  orthogonal  to  z  and  its  largest  singular 
value  is  at  most  £. 


The  idea  in  this  definition  is  that  the  columns  of  U'  form  a  basis  of  the  lattice  of  integer  points  that  are 
orthogonal  to  z,  i.e.,  the  lattice  {b  G  Zm  :  (b,  z)  =  0}.  The  quality  measures  how  “short”  we  can  make  this 
basis. 

Claim  4.6.  The  set  Z  =  {0,  l}m  is  of  quality  2. 


Proof.  Let  z  G  Z  and  assume  without  loss  of  generality  that  its  first  k  >  1  coordinates  are  1  and  the 
remaining  m  —  k  are  0.  Then  consider  the  upper  bidiagonal  matrix  U  whose  diagonal  is  all  Is  and  whose 
diagonal  above  the  main  diagonal  is  (—  1, . . . ,  —  1, 0, . . . ,  0)  with  —1  appearing  k  —  1  times.  The  matrix  is 
clearly  unimodular  and  all  the  columns  except  the  first  one  are  orthogonal  to  z.  Moreover,  by  the  triangle 
inequality,  we  can  bound  the  operator  norm  of  U  by  the  sum  of  that  of  the  diagonal  1  matrix  and  the 
off-diagonal  matrix,  both  of  which  clearly  have  norm  at  most  1.  □ 

Lemma  4.7.  Let  Z  C  Zm  be  of  quality  £  >  0.  Then  for  any  n,q  >  1,  £  G  (0, 1/2),  and  a,r  >  (ln(2m(l  + 
l/e))/7r )1//2/q,  there  is  a  ( transformation )  reduction  from  the  first-is-errorless  variant  of  lSNEn^m)q)Ct  to 
extLWEn  m  q  ta2£2+r2p/2  z  that  reduces  the  advantage  by  at  most  33e/2. 

Proof.  We  first  describe  the  reduction.  Assume  we  are  asked  to  provide  samples  for  some  z  G  Z.  We 
compute  a  unimodular  U  G  Z'"'xm  for  z  as  in  Definition  14.51  and  let  U'  G  Zmx(l"  ^  be  the  matrix 
formed  by  removing  the  first  column  of  U.  We  then  take  m  samples  from  the  given  distribution,  resulting  in 
(A,b)  G  T™xm  x  (Tq  x  T”1"1).  We  also  sample  a  vector  f  from  the  ///-dimensional  continuous  Gaussian 
distribution  If  ^^i  xj'u'7')1/2'  which  is  well  defined  since  £2I  —  U'U,V  is  a  positive  semidefinite  matrix  by 
our  assumption  on  U.  The  output  of  the  reduction  is  the  tuple 

(A'  =  AUT,  b'  +  c,  (z,  f  +  c))  G  TqXrn  x  T™  x  ±Z,  (4.2) 

where  b'  =  Ub+f ,  and  c  is  chosen  from  the  discrete  Gaussian  distribution  I)q  izm_ty.r  (using  LcinmafOl). 

We  now  prove  the  correctness  of  the  reduction.  Consider  first  the  case  that  we  get  valid  LWE  equations, 
i.e.,  A  is  uniform  in  T” xm  and  b  =  ATs  +  e  G  Tm  where  s  G  {0, . . . ,  q  —  l}n  is  uniformly  chosen,  the 
first  coordinate  of  e  G  ffim  is  0,  and  the  remaining  m  —  1  coordinates  are  chosen  from  Da.  Since  U  is 
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unimodular,  A'  =  AUr  is  uniformly  distributed  in  T™xm  as  required.  From  now  on  we  condition  on  an 
arbitrary  A'  and  analyze  the  distribution  of  the  remaining  two  components  of  (14.21).  Next, 

b'  =  Ub  +  f  =  A'ts  +  Ue  +  f . 

Since  Ue  is  distributed  as  a  continuous  Gaussian  Da\ji,  the  vector  Ue  +  f  is  a  distributed  as  a  spherical 
continuous  Gaussian  Da £.  Moreover,  since  A'1  s  €  T™,  the  coset  q  1Zm  —  b'  is  identical  to  q  1Zm  — 
(Ue  +  f),  so  we  can  see  c  as  being  chosen  from  Ug-iZm_(Ue+f)  r.  Therefore,  by  Lemma  12.101  and  using 
that  r  >  ?ye(g_1Zm)  by  Lemma  1231  the  distribution  of  Ue  +  f  +  c  is  within  statistical  distance  8e  of 
Dq—tjm  (a2£2+r2)i/2.  This  shows  that  the  second  component  in  (14.21)  is  also  distributed  correctly.  Finally, 
for  the  third  component,  by  our  assumption  on  U  and  the  fact  that  the  first  coordinate  of  e  is  zero, 

(z,  f  +  c)  =  (z,  Ue  +  f  +  c), 

and  so  the  third  component  gives  the  inner  product  of  the  noise  with  z,  as  desired. 

We  now  consider  the  case  where  the  input  is  uniform,  i.e.,  that  A  is  uniform  in  T”xm  and  b  is  inde¬ 
pendent  and  uniform  in  Tq  x  Tm_1.  We  first  observe  that  by  Lemma [2761  since  a  >  r]E/m(q~  1 Z)  (by 
Lemma  [23k  the  distribution  of  (A.b)  is  within  statistical  distance  e/2  of  the  distribution  of  (A,e'  +  e) 
where  e'  is  chosen  uniformly  in  T™;.  the  first  coordinate  of  e  is  zero,  and  its  remaining  m  —  1  coordinates 
are  chosen  independently  from  Da.  So  from  now  on  assume  our  input  is  (A,  e'  +  e).  The  first  component 
of  (14.21)  is  uniform  in  T/Xm  as  before,  and  moreover,  it  is  clearly  independent  of  the  other  two.  Moreover, 
since  b'  =  Ue'  +  Ue  +  f  and  Ue'  €  T™,  the  coset  q~l  Zm  —  b'  is  identical  to  q~l Zm  —  (Ue  +  f),  and  so 
c  is  distributed  identically  to  the  case  of  a  valid  LWE  equation,  and  in  particular  is  independent  of  e'.  This 
establishes  that  the  third  component  of  (14.21)  is  correctly  distributed;  moreover,  since  e'  is  independent  of 
the  first  and  third  components,  and  Ue'  is  uniform  in  T'/  (since  U  is  unimodular),  we  get  that  the  second 
component  is  uniform  and  independent  of  the  other  two,  as  desired.  □ 

We  end  this  section  by  stating  the  standard  reduction  to  the  multi-secret  (t  >  1)  case  of  extended  LWE. 

Lemma  4.8.  Let  n.  rn.  q.  X-  2  be  as  in  Definition  14.41  with  x  efficiently  sampleable,  and  let  t  >  1  be  an 
integer.  Then  there  is  an  efficient  (transformation)  reduction  from  extLWEn  m  f/-x^  to  extLWE/  m  (?A;  ^  that 
reduces  the  advantage  by  a  factor  oft. 

The  proof  is  by  a  standard  hybrid  argument.  We  bring  it  here  for  the  sake  of  completeness.  We  note  that 
the  distribution  of  the  secret  vector  s  needs  to  be  sampleable  but  otherwise  it  plays  no  role  in  the  proof.  The 
lemma  therefore  naturally  extends  to  any  (sampleable)  distribution  of  s. 

Proof.  Let  A  be  an  algorithm  for  extLWE^  m  z,  let  z  be  the  vector  output  by  A  in  the  first  step  (note  that 
this  is  a  random  variable)  and  let  Hi  denote  the  distribution 

(A,  {bi, . . . ,  bi5  Uj+i, . . . ,  ut},  z,  {(z,  ei)}ie[i])  , 

where  u*+i , . . . ,  uf  are  sampled  independently  and  uniformly  in  T”\  Then  by  definition  Adv[A]  =  Pr[A(F/o)] 
Pr[A(iTj)]|. 

We  now  describe  an  algorithm  B  for  extLWE„v„,  ,^Y^:  First,  B  runs  A  to  obtain  z  and  sends  it  to  the 
challenger  as  its  own  z.  Then,  given  an  input  (A,  d,  z,  y)  for  extLWE,)vm^x^,  the  distinguisher  B  samples 
i*  [t],  and  in  addition  si, . . . ,  Si*_i  -s-  Z”,  ei, . . . ,  e^.+i, . . . ,  et  •(-  x"\  tq*+i, . . . ,  u t  -s-  T”\  It 

sets  b,  =  A7  ■  s,;  +  c,  (mod  1),  and  sends  the  following  to  A: 

(A,  {bl5 . . . ,  d,  Ui*+1, . . . ,  ut},  z,  {(z,  ei), . . . ,  (z,  y,  (z,  e;*+i), . . . ,  (z,  et)})  . 
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Finally,  B  outputs  the  same  output  as  A  did. 

Note  that  when  the  input  to  B  is  distributed  as  Pq  =  (A,  b,  z,  zT  ■  e)  with  b  =  A1  •  s  +  e  (mod  1),  then 
B  feeds  A  with  exactly  the  distribution  H,< .  On  the  other  hand,  if  the  input  to  B  is  I:\  =  (A,  u,  z,  zT  •  e) 
with  u  <r-  Tq\  then  B  feeds  A  with  //,*_] . 

Since  i*  is  uniform  in  [t],  we  get  that 

tAdv[B]  =  t|Pr[B(P0)]-Pr[B(Pi)]| 

=  £  Pr [A(Hi-)]-  Pr[.4(iZi.-i)] 

=  |Pr[A(Pt)]-Pr[A(P0)]| 

=  Adv/4]  , 


and  the  result  follows. 


□ 


4.3  Reducing  to  binary  secret 

Lemma  4.9.  Let  k,  n,m,q  £  N,  e  €  (0, 1/2),  and  d,a,/3, 7  >  0  be  such  that  n  >  k  log2  q  +  2  log2(l/<5), 
/3  >  y/2\n(2n{l  +  \/e))/Tt/q,  a  =  s/2 n/3,  7  =  y/n/3.  Then  there  exist  three  efficient  (transformation) 
reductions  to  binLWEn)mig)<a  from  extLWE//  q  ^  r0  1jn,  LWE and  extLWE//  q  ^  roni,  such  that  if 
B\ ,  £>2,  and  £/  are  the  algorithms  obtained  by  applying  these  reductions  ( respectively )  to  an  algorithm  A, 
then 

Adv[A]  <  Adv[£>i]  +  Adv[£>2]  +  AdvfF/]  +  4me  +  <5  . 


Pointing  out  the  trivial  (transformation)  reduction  from  ext  LWE//  q  p  j  y.  to  extLWE//  q  p  the 
lemma  implies  the  hardness  of  binLWE„mri<a  based  on  the  hardness  of  ext  LWE//  q  p  j(|  ,  j„  and  LWE/(,jr,/(7 
We  note  that  our  proof  is  actually  more  general,  and  holds  for  any  binary  distribution  of  min-entropy  at 
least  k  log2  q  +  2  log2(l/<5),  and  not  just  a  uniform  binary  secret  as  in  the  definition  of  binLWE. 

Proof  The  proof  follows  by  a  sequence  of  hybrids.  Let  k,  n,  rn.  q.  e.  a,  6. 7  be  as  in  the  lemma  statement. 
We  consider  z  «—  {0,  l}n  and  e  -t—  Df,  for  a'  =  -/732||z||2  +  y2  <  y/2 n/3  =  a.  In  addition,  we  let 
A  <—  T/Xm,  u  <—  Tm,  and  define  b:=A  7  •  z  +  e  (mod  1).  We  consider  an  algorithm  A  that  distinguishes 
between  (A,  b)  and  (A,  u). 

We  let  /A)  denote  the  distribution  ( A,  b)  and  H\  the  distribution 

H\  =  (A,  A1  z  —  N7  z  +  e  mod  1), 

where  N  «—  Dqf //  ^  and  e  -t—  D™.  Using  ||z||  <  y/n  and  that  f3  >  s/2 r\e/Ln)/q  (by  Lemmal23T).  it  follows 
by  Lemma [2791  that  the  statistical  distance  between  — NTz  +  e  and  ///  is  at  most  4me.  It  thus  follows  that 

|Pr[A(P0)]  -  Pr[A(Pi)]|  <  4me  .  (4.3) 

We  define  a  distribution  //2  as  follows.  Let  B  Tqxm  and  C  Tqxn.  Let  A:=qC7  ■  B  +  N 
(mod  1).  Finally, 

H2  =  (A,  A1  •  z  —  N7  z  +  e)  =  (A,  qB7  ■  C  •  z  +  e)  . 

We  now  argue  that  there  exists  an  adversary  B\  for  problem  extLWE//  q  j(|  1^„,  such  that 

Adv[Bi]  =  |Pr[A(Pi)]  -  Pr[A(P2)]|  .  (4.4) 
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This  is  because  H  \ ,  H)  can  be  viewed  as  applying  the  same  efficient  transformation  on  the  distributions 
(C,  A,Nrz)  and  (C.  A.  N7  z)  respectively.  Since  distinguishing  the  latter  distributions  is  exactly  the 
extLWE™n  ?  ^  r o  [  ],,  problem  (where  the  columns  of  q  ■  B  are  interpreted  as  the  m  secret  vectors),  the 
distinguisher  B\  follows  by  first  applying  the  aforementioned  transformation  and  then  applying  A. 

For  the  next  hybrid,  we  define  H.>  =  (A.  B7  •  s  +  e),  for  s  -t—  Zg.  It  follows  that 

\Pr[A(H2)}-Pi[A(H3)}\  <5  (4.5) 

by  the  leftover  hash  lemma  (see  Lemma  [2~2l).  since  H-i.  H:i  can  be  derived  from  (C,qC  ■  z)  and  (C,s) 
respectively,  whose  statistical  distance  is  at  most  5. 

Our  next  hybrid  makes  the  second  component  uniform:  H\  =  (A,  u).  There  exists  an  algorithm  B>  for 
LWEfcimiQ)7  such  that 

Adv[i32]  =  |Pr[A(773)]  -  Pr[A(iT4)]|  ,  (4.6) 

since  773,  F/4  can  be  computed  efficiently  from  (B.  Brs  +  e),  (B,  u). 

Lastly,  we  change  A  back  to  uniform:  //-,  =  ( A,  u).  There  exists  an  algorithm  B:>  for  extLWE™n  q  ^  j(J„| 
such  that 

Adv[£3]  =  |Pr[A(F4)]  -  Pt[A(H5)}\  .  (4.7) 

Eq.  (14.71)  is  derived  very  similarly  to  Eq.  (14.41):  We  notice  that  //4,  775  can  be  viewed  as  applying  the  same 
efficient  transformation  on  the  distributions  (C,  A)  and  (C,  A)  respectively.  Since  distinguishing  the  latter 
distributions  is  exactly  the  extLWE™n  q  ^  |()„|  problem  (where  the  columns  of  q  •  B  are  interpreted  as  the  m 
secret  vectors),  the  distinguisher  £>3  follows  by  first  applying  the  aforementioned  transformation  and  then 
applying  A. 

Putting  together  Eq.  (14.31).  (14.41).  (14.51).  (14.61).  (14.71).  the  lemma  follows.  □ 


5  Exact  Gaussian  Sampler 

In  this  section  we  prove  Lemmal231  As  in  iGPV08l.  the  proof  consists  of  two  parts.  In  the  first  we  consider 
the  one-dimensional  case,  and  in  the  second  we  use  it  recursively  to  sample  from  arbitrary  lattices.  Our 
one-dimensional  sampler  is  based  on  rejection  sampling,  just  like  the  one  in  HGPV081.  Unlike  IIGPV081.  we 
use  the  continuous  normal  distribution  as  the  source  distribution  which  allows  us  to  avoid  truncation,  and  as 
a  result  obtain  an  exact  sample.  Our  second  part  uses  the  same  recursive  routine  as  in  IGPV0S1.  but  adds  a 
rejection  sampling  step  to  it  in  order  to  take  care  of  the  deviation  of  its  output  from  the  desired  distribution. 

5.1  The  one-dimensional  case 

Here  we  show  how  to  sample  from  the  discrete  Gaussian  distribution  on  arbitrary  cosets  of  one-dimensional 
lattices.  We  use  a  standard  rejection  sampling  procedure  (see,  e.g.  IIDev861  Page  1 17]  for  a  very  similar 
procedure). 

By  scaling,  we  can  restrict  without  loss  of  generality  to  the  lattice  Z,  i.e.,  we  consider  the  task  of 
sampling  from  Dz+C,r  for  a  given  coset  representative  c  €  [0,1)  and  parameter  r  >  0.  The  sampling 
procedure  is  as  follows.  Let  Zq  =  Jc°°  pr(x)dx,  and  Z\  =  ['  .J  /rr(.x')d.r.  These  two  numbers  can  be 
computed  efficiently  by  expressing  them  in  terms  of  the  error  function.  Let  Z  =  Zq  +  Zi  +  pr(c)  +  pr(c—  1 ) . 
The  algorithm  repeats  the  following  until  it  outputs  an  answer: 

•  With  probability  pr  (c) /Z  it  outputs  c; 
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•  With  probability  pr{c  —  1)/Z  it  outputs  c  —  1; 

•  With  probability  Zq/Z  it  chooses  x  from  the  restriction  of  the  continuous  normal  distribution  Dr  to 
the  interval  [c,  oo).  Let  y  be  the  smallest  element  in  Z  +  c  that  is  larger  than  x.  With  probability 
f)r(y) / Pr(x)  output  y,  and  otherwise  repeat; 

•  With  probability  Z\/Z  it  chooses  x  from  the  restriction  of  the  continuous  normal  distribution  i),  to 
the  interval  (—00,  c  —  1] .  Let  y  be  the  largest  element  in  Z  +  c  that  is  smaller  than  x.  With  probability 
Pr  (y)  / Pr  (x)  output  y,  and  otherwise  repeat. 

Consider  now  one  iteration  of  the  procedure.  The  probability  of  outputting  c  is  pr(c)/Z,  that  of  out- 
putting  c  —  1  is  pr  (c  —  1)/Z,  that  of  outputting  c  +  k  for  some  k  >  1  is 


and  similarly,  that  of  outputting  c  —  1  —  k  for  some  k  >  1  is  pr(c—l  —  k)  jZ .  From  this  it  follows  immediately 
that  conditioned  on  outputting  something,  the  output  distribution  has  support  on  Z  +  c  and  probability  mass 
function  proportional  to  pr,  and  is  therefore  the  desired  discrete  Gaussian  distribution  /.)j?+c  r.  Moreover, 
the  probability  of  outputting  something  is 


Pr(Z  +  c) 

z 


Therefore  at  each  iteration  the  procedure  has  probability  of  at  least  1/2  to  terminate.  As  a  result,  the  proba¬ 
bility  that  the  number  of  iterations  is  greater  than  t  is  at  most  2~L,  and  in  particular,  the  expected  number  of 
iterations  is  at  most  2. 

5.2  The  general  case 

For  completeness,  we  start  by  recalling  the  SampleD  procedure  described  in  IIGPV08I1.  This  is  a  recursive 
procedure  that  gets  as  input  a  basis  B  =  (bi, . . . ,  bn)  of  an  n-dimensional  lattice  A  =  £(B),  a  parameter 
r  >  0,  and  a  vector  c  €  Mn,  and  outputs  a  vector  in  A  +  c  whose  distribution  is  close  to  that  of  D\+C>r. 
Let  bi, . . . ,  bn  be  the  Gram-Schmidt  orthogonalization  of  bi, . . . ,  bn,  and  let  b] .... ,  b„  be  the  normalized 
Gram-Schmidt  vectors,  i.e.,  b,  =  b,/||b,||.  The  procedure  is  the  following. 

1.  Let  cn  <r-  c.  For  i  4—  n, . . . ,  1,  do: 

(a)  Choose  c,  from  GN~  II7/  _  ,  t— .  using  the  exact  one-dimensional  sampler. 

(b)  Let  c,_i  Ci  +  (' Vi  -  (c u  bi))  •  bi/||bi||  -  v^. 

2.  Output  v  :=  £?=i  Vibi. 

It  is  easy  to  verify  that  the  procedure  always  outputs  vectors  in  the  coset  A+c.  Moreover,  the  probability 
of  outputting  any  v  S  A  +  c  is 
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where  c,  are  the  values  computed  in  the  procedure  when  it  outputs  v.  Notice  that  by  Lemma  12.51  and  our 
assumption  on  r,  we  have  that  r  >  77i/(n+i)(||bj||Z)  for  all  i.  Therefore,  by  Lemma [2771  we  have  that  for  all 

cel, 

2 


Pr  ( 1 1  bj  1 1 Z  +  c)  € 


1  - 


n  +  2 


1 


Pr  ( 1 1  bi  1 1 Z) . 


In  order  to  get  an  exact  sample,  we  combine  the  above  procedure  with  rejection  sampling.  Namely,  we 
apply  SampleD  to  obtain  some  vector  v.  We  then  output  v  with  probability 


nr=iPrMz+ (c,,bj)) 

ntiPrdib, 


i  - 


n  +  2 


,1 


C  (e~2, 1], 


(5.1) 


and  otherwise  repeat.  This  probability  can  be  efficiently  computed,  as  we  will  show  below.  As  a  result,  in 
any  given  iteration  the  probability  of  outputting  the  vector  v  E  A  +  c  is 

Pr(v) 


nillPrdlb, 

Since  the  denominator  is  independent  of  v,  we  obtain  that  in  any  given  iteration,  conditioned  on  outputting 
something,  the  output  is  distributed  according  to  the  desired  distribution  Da+c,t,  and  therefore  this  is  also 
the  overall  output  distribution  of  our  sampler.  Moreover,  by  (15.1b.  the  probability  of  outputting  something 
in  any  given  iteration  is  at  least  e  2,  and  therefore,  the  probability  that  the  number  of  iterations  is  greater 
than  t  is  at  most  (1  —  e~  2)l\  and  in  particular,  the  expected  number  of  iterations  is  at  most  e2. 

It  remains  to  show  how  to  efficiently  compute  the  probability  in  (15.1b.  By  scaling,  it  suffices  to  show 
how  to  compute 

Pr( Z  +  c)  =  ^2  exp(— 7T (k  +  c)2/r2) 
fcez 

for  any  r  >  0  and  c  €  [0,  1).  If  r  <  1,  the  sum  decays  very  fast,  and  we  can  achieve  any  desired  t  bits  of 
accuracy  in  time  poly(i),  which  agrees  with  our  notion  of  efficiently  computing  a  real  number  (following, 
e.g.,  the  treatment  in  ILov86l  Section  1.4]).  For  r  >  1,  we  use  the  Poisson  summation  formula  (see, 
e.g.,  [MR041  Lemma  2.8])  to  write 

pr (Z  +  c)  =  r  ■  ^2  exp(— irk2r2  +  lirick)  =  r  •  exp(— irk2r2)  cos(27r ck), 

fcez  fcez 

which  again  decays  fast  enough  so  we  can  compute  it  to  within  any  desired  t  bits  of  accuracy  in  time  poly(f). 
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How  to  Share  a  Lattice  Trapdoor: 
Threshold  Protocols  for  Signatures  and  (H)IBE 

Rikke  Bendlin*  Sara  Krehbief  Chris  Peikert* 


Abstract 

We  develop  secure  threshold  protocols  for  two  important  operations  in  lattice  cryptography,  namely, 
generating  a  hard  lattice  A  together  with  a  “strong”  trapdoor,  and  sampling  from  a  discrete  Gaussian 
distribution  over  a  desired  coset  of  A  using  the  trapdoor.  These  are  the  central  operations  of  many  crypto¬ 
graphic  schemes:  for  example,  they  are  exactly  the  key-generation  and  signing  operations  (respectively) 
for  the  GPV  signature  scheme,  and  they  are  the  public  parameter  generation  and  private  key  extraction 
operations  (respectively)  for  the  GPV  IBE.  We  also  provide  a  protocol  for  trapdoor  delegation,  which  is 
used  in  lattice-based  hierarchical  IBE  schemes.  Our  work  therefore  directly  transfers  all  these  systems  to 
the  threshold  setting. 

Our  protocols  provide  information-theoretic  (i.e.,  statistical)  security  against  adaptive  corruptions  in 
the  UC  framework,  and  they  are  private  and  robust  against  an  optimal  number  of  semi-honest  or  malicious 
parties.  Our  Gaussian  sampling  protocol  is  both  noninteractive  and  efficient,  assuming  either  a  trusted 
setup  phase  (e.g..  performed  as  part  of  key  generation)  or  a  sufficient  amount  of  interactive  but  offline 
precomputation,  which  can  be  performed  before  the  inputs  to  the  sampling  phase  are  known. 
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1  Introduction 


A  threshold  cryptographic  scheme  IIDF891  is  one  that  allows  any  quorum  of  h  out  of  t  trustees  to  jointly 
perform  some  privileged  operation(s)  by  following  a  specified  protocol,  and  remains  correct  and  secure  even  if 
up  to  some  t  <  h  of  the  parties  deviate  from  the  protocol  adversarially.  For  example,  in  a  threshold  signature 
scheme  any  h  trustees  can  sign  an  agreed-upon  message,  and  no  t  malicious  players  (who  may  even  pool  their 
knowledge  and  coordinate  their  actions)  can  prevent  the  signature  from  being  produced,  nor  forge  a  valid 
signature  on  a  new  message.  Similarly,  a  threshold  encryption  scheme  requires  at  least  h  trustees  to  decrypt 
a  ciphertext.  Threshold  cryptography  is  very  useful  for  both  distributing  trust  and  increasing  robustness  in 
systems  that  perform  high-value  operations,  such  as  certificate  authorities  (CAs)  or  private-key  generators  in 
identity-based  encryption  (IBE)  systems. 

Desirable  efficiency  properties  in  a  threshold  system  include:  (1)  efficient  local  computation  by  the 
trustees;  (2)  low  interaction — e.g.,  one  broadcast  message  from  each  party — when  performing  the  privileged 
operations;  and  (3)  key  sizes  and  public  operations  that  are  independent  of  the  number  of  trustees.  For 
example,  while  it  might  require  several  parties  to  sign  a  message,  it  is  best  if  the  signature  can  be  verified 
without  even  being  aware  that  it  was  produced  in  a  distributed  manner. 

Over  the  years  many  elegant  and  rather  efficient  threshold  systems  have  been  developed.  To  name  just  a  few 
representative  works,  there  are  simple  variants  of  the  ElGamal  cryptosystem,  Canetti  and  Goldwasser’s  HCG99I 
version  of  the  CCA-secure  Cramer-Shoup  cryptosystem  ||CS98ll,  and  Shoup’s  IShoOOl  version  of  the  RSA 
signature  scheme.  These  systems,  along  with  almost  all  others  in  the  literature,  are  based  on  number-theoretic 
problems  related  to  either  integer  factorization  or  the  discrete  logarithm  problem  in  cyclic  groups.  As  is 
now  well-known,  Shor’s  algorithm  !Sho97l  would  unfortunately  render  all  these  schemes  insecure  in  a 
“post-quantum”  world  with  large-scale  quantum  computers. 


Lattice-based  cryptography.  Recently,  lattices  have  been  recognized  as  a  viable  foundation  for  quantum- 
resistant  cryptography,  and  the  past  few  years  have  seen  the  rapid  growth  of  many  rich  lattice-based  systems. 
A  fruitful  line  of  research,  starting  from  the  work  of  Gentry,  Peikert  and  Vaikuntanathan  (GPV)  1GPV081. 
has  resulted  in  secure  lattice-based  hash-and-sign  signatures  and  (hierarchical)  identity-based  encryption 
schemes  UCHKPlOllABBlOll.  along  with  many  more  applications  (e.g.,  HGKV101  IBFllbllBFllallAFVlll'). 
All  these  schemes  rely  at  heart  on  two  nontrivial  algorithms:  the  key-generation  algorithm  produces  a  lattice  A 
together  with  a  certain  kind  of  “strong”  trapdoor  (e.g.,  a  short  basis  of  A)  [Ajt99|lAP09l,  while  the  signing/key- 
extraction  algorithms  use  the  trapdoor  to  randomly  sample  a  short  vector  from  a  discrete  Gaussian  distribution 
over  a  certain  coset  A  +  c,  which  is  determined  by  the  message  or  identity  llGPV08l.  Initially,  both  tasks 
were  rather  complicated  algorithmically,  and  in  particular  the  Gaussian  sampling  algorithm  involved  several 
adaptive  iterations,  so  it  was  unclear  whether  either  task  could  be  efficiently  and  securely  distributed  among 
several  parties.  Recently,  however,  both  key  generation  and  Gaussian  sampling  have  been  simplified  and  made 
more  efficient  and  parallel  llPei  1  Ol  IMP  121 .  This  is  the  starting  point  for  our  work. 


Our  results.  We  give  threshold  protocols  for  the  main  nontrivial  operations  in  lattice-based  signature  and 
(H)IBE  schemes,  namely:  (1)  generating  a  lattice  A  together  with  a  strong  trapdoor  of  the  kind  recently 
proposed  in  HMP12B.  (2)  sampling  from  a  discrete  Gaussian  distribution  over  a  desired  coset  of  A,  and 
(3)  delegating  a  trapdoor  for  a  higher-dimensional  extension  of  A.  Since  these  are  the  only  secret-key 
operations  used  in  the  signature  and  (H)IBE  schemes  of  [GPV08.  CHKP10ilABB1011MP12l  and  several  other 
related  works,  our  protocols  can  be  plugged  directly  into  all  those  schemes  to  distribute  the  signing  algorithms 
and  the  (H)IBE  private-key  generators.  In  Section  [4]  we  show  how  this  is  (straightforwardly)  done  for  the 
simplest  of  these  applications,  namely,  the  GPV  signature  scheme  HGPV081:  the  GPV  IBE  scheme  and  other 
applications  work  similarly. 
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Our  protocols  have  several  desirable  properties: 


They  provide  information-theoretic  (i.e.,  statistical)  security  for  adaptive  corruptions.  By  information- 
theoretic  security,  we  mean  that  the  security  of  the  key-generation  and  sampling  protocols  themselves 
relies  on  no  computational  assumption — instead,  the  application  alone  determines  the  assumption 
(usually,  the  Short  Integer  Solution  assumption  |Ajt96[  MR041  for  digital  signatures,  and  Learning  With 
Errors  |Reg05|  for  identity -based  encryption).  We  work  in  a  version  of  the  universal  composability 
(UC)  framework  UCanOll ,  specialized  to  the  threshold  setting,  and  as  a  result  also  get  strong  security 
guarantees  for  protocols  under  arbitrary  composition. 


•  They  are  secure  for  an  optimal  threshold  of  semi-honest  or  active  (malicious)  parties,  which  is  deter¬ 
mined  by  the  precise  communication  model  and  setup  assumption.  For  example,  we  can  tolerate  h  —  1 
semi-honest  parties  assuming  trusted  setup  (see  below),  or  t  =  h  —  1  malicious  parties  in  a  model  with 
both  broadcast  and  private  channels,  using  the  verifiable  secret  sharing  scheme  of  IIRB891.  (Recall 
that  h  is  the  number  of  (semi-)honest  parties  the  protocol  requires  to  execute  successfully,  and  the 
robustness  threshold  t  is  an  upper  bound  on  the  number  of  malicious  parties.) 


•  The  public  key  and  trapdoor  “quality”  (i.e.,  the  width  of  the  discrete  Gaussian  that  can  be  sampled 
using  the  trapdoor;  smaller  width  means  higher  quality)  are  essentially  the  same  as  in  the  standalone 
setting.  In  particular,  their-  sizes  are  independent  of  the  number  of  trustees;  the  individual  shares  of  the 
trapdoor  are  the  same  size  as  the  trapdoor  itself;  and  the  protocols  work  for  the  same  lattice  parameters 
as  in  the  standalone  setting,  up  to  small  constant  factors. 

•  They  have  noninteractive  and  very  efficient  online  phases  (corresponding  to  the  signing  or  key-extraction 
operations),  assuming  either  (1)  a  setup  phase  in  which  certain  shares  are  distributed  by  a  trusted  party 
(e.g.,  as  part  of  key  generation),  or  (2)  the  parties  themselves  perform  a  sufficient  amount  of  interactive 
precomputation  in  an  offline  phase  (without  relying  on  any  trusted  party).  We  provide  protocols  for 
these  two  settings  in  Section  [3]  and  Appendix  [A]  respectively. 


Regarding  the  final  item,  the  trusted  setup  model  is  the  one  used  by  Canetti  and  Goldwasser  HCG991 
for  constructing  threshold  chosen  ciphertext-secure  threshold  cryptosystems:  as  part  of  the  key-generation 
process,  a  trusted  party  also  distributes  shares  of  some  appropriately  distributed  secrets  to  the  parties,  which 
they  can  later  use  to  perform  an  a  priori  bounded  number  of  noninteractive  threshold  operations.  Or,  in  lieu  of 
a  trusted  party,  the  players  can  perform  some  interactive  precomputation  (offline,  before  the  desired  coset  is 
known)  to  generate  the  needed  randomness.  The  downside  is  that  this  precomputation  is  somewhat  expensive, 
since  the  only  solution  we  have  for  one  important  step  (namely,  sampling  shares  of  a  Gaussian-distributed 
value  over  Z)  is  to  use  somewhat  generic  information-theoretic  multiparty  computation  tools.  On  the  plus 
side,  the  circuit  for  this  sampling  task  is  rather  shallow,  with  depth  just  slightly  super-constant  w(l),  so  the 
round  complexity  of  the  precomputation  is  not  very  high.  We  emphasize  that  the  expensive  precomputation  is 
executed  offline,  before  the  applications  decides  which  lattice  cosets  will  be  sampled  from,  and  that  the  online 
protocols  remain  efficient  and  non-interactive. 

Our  protocols  rely  on  the  very  simple  form  of  the  new  type  of  strong  trapdoor  recently  proposed  in  liMP12l. 
and  the  parallel  and  offline  nature  of  recent  standalone  Gaussian  sampling  algorithms  BPeilOl iMPl  21  A  key 
technical  challenge  is  that  the  security  of  the  sampling  algorithms  from  BPeilOl fMP  121  crucially  relies  on 
the  secrecy  of  some  intermediate  random  variables  known  as  “perturbations.”  However,  in  order  to  obtain  a 
noninteractive  protocol  we  need  the  parties  to  publicly  reveal  certain  information  about  these  perturbations. 
Fortunately,  we  can  show  that  the  leaked  information  is  indeed  simulatable,  and  so  security  is  unharmed.  See 
Section[3]and  in  particular  Lemma |T2| for  further  details. 


1  In  particular,  it  appears  very  difficult  to  implement,  in  a  noninteractive  threshold  fashion,  iterative  sampling  algorithms  like  those 
from  IKle00llGPV081  which  use  the  classical  trapdoor  notion  of  a  short  basis. 
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Open  problems.  In  addition  to  simple,  non-interactive  protocols  for  discrete  Gaussian  sampling  with 
trusted  setup,  in  Appendix  [A]  we  give  efficient  protocols  for  discrete  Gaussian  sampling  that  avoid  both  trusted 
setup  and  online  interaction  by  using  (offline)  access  to  a  functionality  J-"sampZ,  which  produces  shares  of 
Gaussian-distributed  values  over  the  integers  Z  (see  Appendix | A .  1 1 for  details).  We  show  how  to  instantiate 
J-sampZ  using  a  (somewhat  inefficient)  interactive  protocol  using  generic  MPC  techniques.  It  remains  an 
interesting  open  problem  to  design  discrete  Gaussian  sampling  protocols  without  trusted  setup  whose  offline 
precomputation  is  efficient  and/or  non-interactive  as  well.  An  efficient  realization  of  J^ampZ  would  yield  such 
a  solution,  but  there  may  be  other  routes  as  well. 

Another  intriguing  problem  is  to  give  a  simple  and  noninteractive  threshold  protocol  for  inverting  the  LWE 
function  e)  =  A  +  c1  mod  q  (for  short  error  vector  e)  using  a  shared  trapdoor.  We  find  it  surprising 
that,  while  in  the  standalone  setting  this  inversion  task  is  conceptually  and  algorithmically  much  simpler  than 
Gaussian  sampling,  we  have  not  yet  been  able  to  find  a  simple  threshold  protocol  for  itj^]  Such  a  protocol 
could,  for  example,  be  useful  for  obtaining  threshold  analogues  of  the  chosen  ciphertext-secure  cryptosystems 
from  liPei09llMP121.  without  going  through  a  generic  IBE-to-CCA  transformation  [BCHK071. 


Related  work  in  threshold  lattice  cryptography.  A  few  works  have  considered  lattice  cryptography  in 
the  threshold  setting.  For  encryption  schemes,  Bendlin  and  Damgard  HBDIOII  gave  a  threshold  version  of 
Regev’s  CPA-secure  encryption  scheme  based  on  the  learning  with  errors  (LWE)  problem  |Reg05].  Related 
work  by  Myers  et  al.  HMSslll  described  threshold  decryption  for  fully  homomoiphic  cryptosystems.  Xie  et 
al.  1XXZ1 1  ]  gave  a  threshold  CCA-secure  encryption  scheme  from  any  lossy  trapdoor  function  (and  hence 
from  lattices/LWE  jPW081l).  though  its  public  key  and  encryption  runtime  grow  at  least  linearly  with  the 
number  of  trustees.  For  signatures,  Feng  et  al.  ilFGMIOll  gave  a  threshold  signature  scheme  where  signing 
proceeds  sequentially  through  each  trustee,  making  the  scheme  highly  interactive;  also,  the  scheme  is  based 
on  NTRUSign,  which  has  been  broken  HNR061.  Cayrel  et  al.  IICLRS101  gave  a  lattice-based  threshold  ring 
signature  scheme,  in  which  at  least  t  trustees  are  needed  to  create  an  anonymous  signature.  In  that  system,  each 
trustee  has  its  own  public  key,  and  verification  time  grows  linearly  with  the  number  of  trustees.  In  summary, 
lattice-based  threshold  schemes  to  date  have  either  been  concerned  with  distributing  the  decryption  operation 
in  public -key  cryptosystems,  and/or  have  lacked  key  efficiency  properties  typically  asked  of  threshold  systems 
(which  our  protocols  do  enjoy).  Also,  other  important  applications  such  as  (H)IBE  have  yet  to  be  realized  in  a 
threshold  manner. 


Organization.  The  remainder  of  the  paper  is  organized  as  follows.  In  Section[2]we  overview  the  relevant 
background  on  lattices,  secret  sharing,  and  the  UC  framework.  In  Section  [3]  we  review  the  standalone 
key-generation  and  discrete  Gaussian  sampling  algorithms  of  llMP12ll.  present  our  functionalities  for  these 
algorithms  in  the  threshold  setting,  and  show  how  these  functionalities  can  be  implemented  efficiently 
and  noninteractively  using  trusted  setup.  At  the  end  of  Section  [3]  we  additionally  provide  a  functionality 
and  protocol  for  trapdoor  delegation.  Finally,  in  Section  [4]  we  detail  a  simple  example  application  of  our 
protocols,  namely,  a  threshold  version  of  the  GPV  signature  scheme  f  GPVOtSI  realizing  the  threshold  signature 
functionality  of  [ADN06J.  In  the  appendix,  we  remove  the  trusted  setup  assumption  and  show  how  to  instead 
use  offline  interaction  to  implement  all  our  functionalities. 


2  Preliminaries 

We  denote  the  reals  by  R  and  the  integers  by  Z.  For  a  positive  integer  £,  we  let  [i]  =  {1, . . . ,  £}. 

2We  note  that  it  is  possible  to  give  a  threshold  protocol  using  a  combination  of  Gaussian  sampling  and  trapdoor  delega¬ 
tion  ICHKP10IIMP12I.  but  it  is  obviously  no  simpler  than  Gaussian  sampling  alone. 


3 


Approved  for  Public  Release;  Distribution  Unlimited. 
270 


A  square  symmetric  real  matrix  £  is  positive  definite,  written  £  >  0,  if  x;  £x  >  0  for  all  nonzero  x. 
Positive  definiteness  defines  a  partial  ordering  on  symmetric  matrices:  we  say  that  £1  >  £2  if  (£1  —  £2)  >  0. 
For  any  nonsingular  matrix  B  G  Mnxn,  the  symmetric  matrix  £  =  BB*  is  positive  definite.  We  say  that 
B  is  a  square  root  of  £  >  0,  written  B  =  v/S,  if  BBf  =  £.  Every  £  >  0  has  a  square  root;  moreover, 
the  square  root  is  unique  up  to  right-multiplication  by  an  orthogonal  matrix,  i.e.,  B'  =  s/Y  if  and  only  if 
B'  =  BQ  for  some  orthogonal  matrix  Q.  A  square  root  can  be  computed  efficiently  using,  e.g.,  the  Cholesky 
decomposition.  The  largest  singular  value  (also  called  spectral  norm  or  operator  norm)  of  a  real  matrix  X  is 
defined  as  si(X)  =  maxu^o||Xu||/||u||.  For  convenience,  we  sometime  write  a  scalar  s  to  mean  the  scaled 
identity  matrix  si,  whose  dimension  will  be  clear  from  context. 

2.1  Continuous  Gaussians 

The  n-dimensional  Gaussian  function  p:  Mn  — >  (0, 1]  is  defined  as 

p(x)  =  exp(— 7T  •  1 1 x 1 1 2)  =  exp(— 7T  •  (x,  x)). 

Applying  a  linear  transformation  given  by  a  nonsingular  real  matrix  B  yields  the  Gaussian  function 

Pb(x)  :=  p(B_1x)  =  exp  (— ir  ■  (B_1x,  B_1x))  =  exp  (— ir  ■  xtY~1xj  , 

where  £  =  BB*  >  0.  Because  pB  is  distinguished  only  up  to  £,  we  usually  refer  to  it  as  p  /s. 

Normalizing  p ^  by  its  total  measure  fRn  p ^(x)  dx  =  \/ det  £  over  Mn,  we  obtain  the  probability 
distribution  function  of  the  (continuous)  Gaussian  distribution  A)  a,.  It  is  easy  to  check  that  a  random 
variable  x  having  distribution  can  be  written  as  v/£  •  z,  where  z  has  spherical  Gaussian  distribution  D\. 
Therefore,  the  random  valuable  x  has  covariance 

E  [x-xfl  =  v/£-  E  Iz-z*!  -VY1  =  v/£-  —  •  Vy!  =  — , 
j  z~Di  l  j  2tt  2t t 

by  linearity  of  expectation.  (The  1/ (27 r)  covariance  ofz  ~  I)\  arises  from  the  independence  of  its  entries, 
which  are  each  distributed  as  I)\  in  one  dimension,  and  therefore  have  variance  1/(2tt).)  For  convenience,  in 
this  paper  we  implicitly  scale  all  covariance  matrices  by  a  27t  factor,  and  refer  to  £  as  the  covariance  matrix 

of  DVt,- 

2.2  Lattices  and  Discrete  Gaussians 

A  lattice  A  is  a  discrete  additive  subgroup  of  Mm  for  some  m  >  0.  In  this  work  we  arc  only  concerned  with 
full-rank  integer  lattices,  which  are  additive  subgroups  of  Zm  with  finite  index.  Most  recent  cryptographic 
applications  use  a  particular  family  of  so-called  q-ary  integer  lattices,  which  contain  (f£'n  as  a  sublattice  for 
some  integer  q,  which  in  this  work  will  always  be  bounded  by  poly(n).  For  positive  integers  n  and  q,  let 
A  G  Z£xm  be  arbitrary,  and  define  the  full -rank  m-dimensional  q- ary  lattice 

AJ_(A)  =  {z£  Zm  :  Az  =  0  mod  q}. 

For  any  u  e  Z”  admitting  an  integral  solution  x  G  Zm  to  Ax  =  u  mod  q,  define  the  coset  (or  shifted  lattice) 

A„(A)  =  A2  (A)  +  x  =  {z  G  Zm  :  Az  =  u  mod  g}. 

Note  that  for  n,m,q  <  2  and  m  >  Cn  log  q  for  some  fixed  constant  C  >  1,  the  columns  of  a  uniformly 
random  matrix  A  G  Z™xm  generate  all  of  Z”  with  all  but  negl(n)  probability. 
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Let  A  C  Mr"  be  a  lattice,  let  c  E  Mm,  and  let  S  >  0  be  a  positive  definite  matrix.  The  discrete  Gaussian 
distribution  F>A+c  ^  —  is  simply  the  Gaussian  distribution  restricted  so  that  its  support  is  the  coset  A  +  c. 
That  is,  for  all  x  E  A  +  c, 

Da+».VeW  =  +  c)  K  ^(x)' 

A  discrete  Gaussian  is  said  to  be  spherical  with  parameter  s  >  0  if  its  covariance  matrix  is  s2I. 

We  recall  an  important  definition  and  some  useful  properties  of  discrete  Gaussian  distributions  on  lattices. 
For  e  >  0,  the  smoothing  parameter  [MR041  ?/f  (A)  of  a  lattice  A  is  defined  as  the  smallest  s  >  0  such  that 
Pi/,(  A*\{0})  <  e,  where  A*  is  the  dual  lattice  (whose  precise  definition  we  will  not  need  here).  Here  we 
generalize  the  smoothing  parameter  to  non-spherical  Gaussians;  note  that  this  definition  is  consistent  with  the 
partial  ordering  on  positive  definite  matrices. 

Definition  2.1  (Smoothing  parameter).  Let  T.  >  0  be  any  positive  definite  matrix.  We  say  that  \/S  >  rje( A) 
ifVe(V^  1  •  A)  <  1,  or  equivalently,  if  r(A*\{0})  <  e. 


The  following  lemma  is  a  slight  generalization  of  [Reg05  Claim  3.8]  (see  also  I.MR04.  Lemma  4. 1])  to 
non-spherical  Gaussians,  obtained  by  applying  a  linear  transformation  to  the  Gaussian  function  and  lattice. 
Informally,  it  says  that  every  coset  of  A  has  essentially  the  same  mass  under  p when  \/S  exceeds  the 
smoothing  parameter  of  A.  The  corollary  then  follows  by  a  routine  calculation. 


Lemma  2.2.  For  any  m-dimensional  lattice  A,  real  e  >  0,  r  >  r/e(A),  and  c  E  Mm,  we  have  p^( A  +  c)  E 
[lie]  -  Z\  r,  where  Z  \r  depends  only  on  A  and  r  (not  c). 


Corollary  2.3.  Let  A'CA  be  full-rank  lattices,  and  let  \/Y.  >  rje(A')for  some  e  >  0.  For  x  DA  ^/v,  the 
marginal  distribution  of  c  =  x  mod  A'  is  within  statistical  distance  e/2  from  uniform  over  A/ A',  and  the 
conditional  distribution  o/x  given  c  is  DA,+c 

The  following  special  case  of  IMP  12  .  Lemma  2.4]  says  that  for  uniformly  random  A  and  appropriate 
parameters,  the  lattice  A2- (A)  has  small  smoothing  parameter  with  very  high  probability. 

Lemma  2.4.  Let  n,m,q  >  2  be  positive  integers  and  C  >  1  be  a  fixed  constant  such  that  m  >  Cn  log  q, 
and  let  A  E  xm  be  uniformly  random.  For  any  fixed  ujn  =  w(Vlogn)  there  exists  some  e  =  negl(n)  such 
that  ?ye(A-1(A))  <  un  except  with  probability  2~n(n\ 


Finally,  we  need  the  “convolution  lemma”  of  IPeilOt  Theorem  3.1]. 

Lemma  2.5.  Let  Si,  S2  >  0  be  positive  definite  matrices,  with  S  =  Si  +  S2  >  0  and  S-1  =  S/  1  +  So1  > 
0.  Let  Ai,  A2  be  lattices  such  that  y/S 1  >  r/e( Ai)  and  1/S2  >  qfiA-f  for  some  positive  e  <  1/2,  and  let 
Ci,  C2  E  MTn  be  arbitrary.  In  the  following  experiment: 


choose  x2  i-  Da2+C2,^,  then  choose  xi  <-  x2  i  ^>Al+Cl_X2iV^, 
the  marginal  distribution  of  x  1  is  within  statistical  distance  8e  e;/GA|  C| 

Throughout  the  paper  we  often  attach  a  factor  utn  =  c on(ri)  =  w(\/log  n),  which  represents  an  arbitrary 
fixed  function  that  grows  asymptotically  faster  than  \/log  n,  to  Gaussian  parameters  \/S  (or  u.f  to  covariance 
matrices  S).  In  exposition  we  usually  omit  reference  to  these  factors,  but  we  always  retain  them  where  needed 
in  formal  expressions. 
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2.3  Secret  Sharing  for  Additive  Groups 


In  this  work  we  will  need  to  distribute  secret  lattice  points  among  multiple  players,  so  that  any  sufficiently 
laige  number  of  players  is  able  to  reconstruct  the  points,  but  smaller  subsets  collectively  get  no  information 
about  the  secret.  Because  a  lattice  A  is  an  infinite  additive  group  (and  in  particular  is  not  a  field),  it  is  not 
immediately  amenable  to  standard  secret- sharing  techniques  like  those  of  llSha79l.  Fortunately,  for  our 
purposes  it  will  suffice  to  share  elements  of  a  suitable  finite  quotient  group  G  =  A/A,  where  the  sublattice 
A  C  A  is  “sparse”  enough  in  A  that  an  element  of  G  identifies  an  element  of  A  with  enough  specificity 
for  our  applications.  There  is  a  rich  theory  of  secret  sharing  for  arbitrary  additive  groups  and  modules, 
e.g.,  [DF94[|Feh98ll.  Here  we  recall  the  relevant  material  in  enough  generality  for  our  puiposes. 

Let  G  be  a  finite  abelian  (additive)  group  with  identity  element  0.  The  exponent  of  G,  denoted  e(G),  is 
the  smallest  positive  integer  m  such  that  mg  =  g  +  g  +  --  -  +  g  =  Ofor  every  g  E  G.  We  have  that  G  is  a 
module  over  the  ring  R  =  Z e(g),  which  gives  the  following  form  of  Shamir’s  it  +  1  j-out-of-/  secret-sharing 
scheme  IISha79i.  Let  t  <  £  he  positive  integers,  where  t  denotes  a  bound  on  the  number  of  corrupt  players 
out  of  £  total.  Suppose  that  G  is  an  /(-module  for  some  ring  R  which  has  efficiently  computable  operations 
and  £+1  known  elements  U  =  {ro  =  0,  r±, . . . ,  77}  C  R  such  that  rj  —  rj  is  invertible  in  R  (i.e.,  a  unit)  for 
every  i  f  j.  For  example,  in  our  protocols  we  will  have  e(G)  =  qd  for  some  public  integers  q  >  2  and  d  >  1, 
so  we  can  take  R  =  Z qd  and  r,  =  i  mod  qd,  as  long  as  £  is  smaller  than  every  prime  divisor  of  q.  When  this 
condition  does  not  hold,  we  can  use  an  extension  ring  instead,  as  described  below. 

To  share  a  value  g  E  G,  one  chooses  a  formal  polynomial  f(X )  =  J^-_0  fj X:!  E  G\X]  of  degree  at 
most  t,  where  fo  =  g  and  the  f  ,  E  G  for  i  >  1  are  uniformly  random  and  independent.  Player  i  E  [£]  is 
publicly  associated  with  the  value  rj  E  R,  and  gets  the  share  Sj  =  /(rj)  =  rj  fj  E  G.  Usually  we  let  / 

be  implicit,  denoting  the  ?ith  player’s  share  as  \g\l  and  the  tuple  of  all  shares  by  [[</].  Note  that  the  product 
group  G1'  is  also  an  additive  group  with  exponent  e(G),  so  we  can  share  vectors  or  matrices  with  entries  in  G 
as  above,  using  the  same  ring  Z etQy  (Equivalently,  this  is  just  an  independent  entry-wise  sharing.) 

The  above  scheme  has  several  important  properties  (whose  proofs  are  straightforward;  see,  e.g.,  fFeh98  B: 

•  It  is  ideal :  the  shares  Sj  E  G  belong  to  the  same  set  as  the  shared  value  g  E  G. 

•  It  is  perfectly  secret :  for  any  shared  value  g  E  G,  any  tuple  of  up  to  t  shares  s,  is  distributed  uniformly. 

•  It  is  perfectly  correct  and  robust :  any  t  +  1  shares  s%  of  g  (along  with  their  corresponding  evaluation 
points  rj)  can  be  used  to  efficiently  recover  f(X),  and  hence  g  =  fo  =  /( 0),  by  interpolation. 

Moreover,  given  at  least  3t  +  1  values  sj  (along  with  the  corresponding  evaluation  points  rf),  where  at 
least  2f  +  1  are  correct  shares  sj  =  /(rj)  of  g  and  the  remaining  t  may  be  arbitrary,  one  can  efficiently 
recover  fjX)  and  hence  g  =  fo  using,  e.g.,  the  Welch-Beriekamp  algorithm  for  unambiguous  decoding 
of  Reed-Solomon  codes.  (The  algorithm  is  usually  described  for  codes  defined  over  finite  fields,  but  its 
proof  of  correctness  goes  through  without  modification  in  our  setting.) 

•  It  is  homomorphic,  if  g,g'  E  G  have  respective  shares  st  =  f(ri),sj  =  /'(rj)  for  i  E  \£] .  then 
Sj  +  sj  =  (/  +  /')(rj )  and  rsj  =  (r/)(rj)  are  respective  shares  of  g  +  g'  and  rg  for  any  r  E  R. 
Moreover,  let  G'  C  G  be  a  subgroup;  then  Sj  =  Sj  mod  G'  are  shares  of  g  =  g  mod  G' ,  via  the 
polynomial  f(X)  =  f(X)  mod  G'[X\.  Additionally,  if  g  E  G' ,  then  Sj  —  Sj  E  G'  are  shares  of  g. 


Secret  sharing  with  extension  rings.  The  above  scheme  works  when  the  number  of  parties  £  is  less  than 
every  prime  divisor  of  e(G).  When  this  is  not  the  case  (e.g.,  when  using  q  =  2k,  which  is  a  convenient 
choice  for  the  trapdoor  construction  described  in  Section  3.1 1,  we  can  instead  share  elements  from  the  vector 
group  Gk,  which  is  a  module  over  a  certain  extension  ring  of  Z e(G)  that  has  a  suitable  set  U  of  size  pk,  where  p 
is  the  smallest  prime  divisor  of  e(G).  By  choosing  k  >  logp(£  +  1),  we  can  share  elements  of  G  among  £ 
players  using  shares  in  Gk,  or  even  amortize  the  sharing  of  up  to  k  elements  in  G  at  a  time. 
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In  brief,  we  use  the  extension  ring  R  =  Ze(G)  [X]/ F(X )  for  any  monic  degree- A;  polynomial  F(X )  = 
Y*;=(}  F{Xl  £  Ze/G\[X]  that  is  irreducible  modulo  every  prime  dividing  e(G).  Then  it  can  be  verified 
that  Gk  is  an  R- module,  where  multiplication  R  x  Gk  -X  Gk  is  defined  by  the  rule  X  ■  (go, .. . , gu~\)  = 
(0,  go, ... ,  gu- 2)  —  (Fo  '  9k- 1,  ■ . . ,  F^- 1  •  gk- 1).  An  element  of  R  is  a  unit  if  and  only  if  it  is  nonzero  modulo 
every  prime  integer  divisor  of  e(G),  so  letting  p  be  the  smallest  such  divisor,  the  polynomial  residues  in  R 
with  coefficients  in  {0, . . .  ,p  —  1}  give  us  pk  elements  r,  E  R  such  that  r,  —  rj  is  a  unit  for  all  i  f  j,  as 
needed.  See,  for  example,  HDF94II  or  IFeh981  Chapter  3]  for  full  details. 

Verifiable  secret  sharing.  To  recover  a  shared  value,  our  protocols  instruct  honest  parties  to  broadcast 
their  respective  shares  and  then  reconstruct  the  value  from  the  announced  shares.  As  mentioned  above,  the 
Welch-Berlekamp  algorithm  efficiently  reconstructs  the  shared  value  given  at  least  2t  +  1  correct  shares  and 
up  to  t  incorrect  ones  (which  may  come  from  malicious  parties),  which  means  we  can  tolerate  any  t  <  £/3 
malicious  parties.  Assuming  appropriate  communication  channels,  it  is  possible  to  improve  this  threshold  to 
any  t  <  t/2  malicious  parties  by  using  a  verifiable  secret  sharing  (VSS)  protocol,  e.g.,  the  one  of  |IRB89||. 
The  share-distribution  and  reconstruction  steps  of  our  protocols  can  be  straightforwardly  modified  to  use  VSS, 
but  we  omit  these  modifications  for  simplicity  of  exposition. 

2.4  UC  Framework 

We  frame  our  results  in  the  Universal  Composability  (UC)  framework  [CanOO.ICanOll.  In  the  UC  framework, 
security  is  defined  by  considering  a  probabilistic  polynomial-time  (PPT)  machine  Z,  called  the  environment. 
In  coordination  with  an  adversary  that  may  corrupt  some  of  the  players,  Z  chooses  inputs  and  observes 
the  outputs  of  a  protocol  executed  in  one  of  two  worlds:  a  “real”  world  in  which  the  parties  interact  with 
each  other  in  some  specified  protocol  ir  while  a  dummy  adversary  A  (controlled  by  Z)  corrupts  players  and 
controls  their  interactions  with  honest  players,  and  an  “ideal”  world  in  which  the  players  interact  directly 
with  a  functionality  T,  while  a  simulator  S  (communicating  with  Z)  corrupts  players  and  controls  their 
interactions  with  T.  The  views  of  the  environment  in  these  executions  are  respectively  denoted  REAL^-^^ 
and  I D  EAL  5  and  the  protocol  is  said  to  realize  the  functionality  if  these  two  views  are  indistinguishable.  In 
this  work  we  are  concerned  solely  with  statistical  indistinguishability  (which  is  stronger  than  the  computational 

S 

analogue),  denoted  by  the  relation  «. 

Definition  2.6.  A  protocol  7 r  statistically  realizes  a  functionality  T  (or  alternatively,  is  a  UC-secure  imple¬ 
mentation  of  T)  if  for  any  probabilistic  polynomial-time  (PPT)  adversary  A,  there  exists  a  PPT  simulator  S 
such  that  for  all  PPT  environments  Z,  we  have  IDEALj-^z  ~  REAL^^^- 

The  universal  composition  theorem  fCanOTl  informally  states  that  any  UC-secure  protocol  remains 
secure  under  concurrent  general  composition.  This  allows  for  the  modular  design  of  functionalities  and 
protocols  which  can  be  composed  to  produce  secure  higher-level  protocols.  Our  functionalities  implicitly  use 
standard  conventions  like  delayed  public  and  private  outputs,  corruptions,  etc,  which  are  addressed  in  detail 
in  ICanOOllCanOll. 

UC  framework  for  threshold  protocols.  We  consider  a  specialized  case  of  the  UC  framework  that  is 
appropriate  for  modeling  threshold  protocols.  All  of  our  functionalities  are  called  with  a  session  ID  of  the 
form  sid  =  (V,  sid'),  where  V  is  a  set  of  t  parties  representing  the  individual  trustees  in  the  threshold 
protocol.  We  prove  security  against  adversaries  that  may  adaptively  corrupt  a  certain  bounded  number  of 
the  parties  over  the  entire  lifetime  of  a  protocol,  and  consider  both  the  semi-honest  case  (in  which  corrupted 
parties  still  execute  the  protocol  faithfully)  and  the  malicious  case.  At  the  time  of  corruption,  the  entire 
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view  of  the  player  to  that  point  (and  beyond)  is  revealed  to  the  adversary;  in  particular,  we  do  not  assume 
secure  erasures.  For  robustness,  we  additionally  require  that  when  the  environment  issues  a  command  to  a 
functionality/protocol,  it  always  does  so  for  at  least  h  honest  parties  in  the  same  round. 

Many  of  our  protocols  require  the  parties  to  maintain  and  use  consistent  local  states,  corresponding  to 
certain  shared  random  variables  that  are  consumed  by  the  protocols.  We  note  that  synchronizing  their  local 
states  may  be  nontrivial,  if  not  every  party  is  involved  with  executing  every  command.  For  this  reason  we 
assume  some  mechanism  for  coordinating  local  state,  such  as  hashing  as  suggested  in  HCG99I1.  which  deals 
with  similar  synchronization  issues. 


3  Threshold  Key  Generation,  Gaussian  Sampling,  and  Trapdoor  Delegation 


In  this  section,  we  present  UC  functionalities  and  protocols  for  generating  a  lattice  with  a  shared  trapdoor, 
for  sampling  from  a  coset  of  that  lattice,  and  for  securely  delegating  a  trapdoor  of  a  higher-dimensional 
extension  of  the  lattice.  As  an  example  application  of  these  functionalities,  in  Section  [4]  we  describe 
threshold  variants  of  the  GPV  signature  and  IBE  schemes  of  BGPV081:  other  signature  and  (H)IBE  schemes 
(e.g.,  HCHKP10ilABB10llMP12ll)  can  be  adapted  similarly  (where  delegation  is  needed  for  HIBE). 

We  start  in  Section [34] by  recalling  the  recent  standalone  (non-threshold)  key  generation  and  discrete 
Gaussian  sampling  algorithms  of  BMP  121.  which  form  the  basis  of  our  protocols.  In  Section [A2| we  present  the 
two  main  functionalities  Akg  (key  generation)  and  (Gaussian  sampling)  corresponding  to  the  standalone 
algorithms. 


Since  key  generation  tends  to  be  rare  in  applications,  JAg  can  be  realized  using  trusted  setup;  alterna¬ 
tively,  in  Appendix[A]we  realize  J-kg  via  an  interactive  protocol  without  trusted  setup. 

To  realize  J-gs>  wc  define  two  lower-level  “helper”  functionalities  .Fperturb  and  J-correct>  and  give  in 


Section  3.3  an  efficient  noninteractive  protocol  that  realizes  J-gs  using  access  to  them.  Section 
realizes  the  helper  functionalities  noninteractively  using  trusted  setup,  and  Appendix  [A]  realizes  them 
using  offline  precomputation  (instead  of  trusted  setup). 


Finally,  in  Section  3.4  we  give  a  functionality  J-beiTrap  and  protocol  for  trapdoor  delegation. 


3.1  Trapdoors  and  Standalone  Algorithms 


We  recall  the  notion  of  a  (strong)  lattice  trapdoor  and  associated  algorithms  recently  introduced  by  Micciancio 
and  Peikert  BMP12B  (see  that  paper  for  full  details  and  proofs).  Let  n  and  q  be  positive  integers  and  k  =  |"lg  q] . 
Define  the  “gadget”  vector  g  =  (1,  2, 4, . . . ,  2fc_1)  £  Z^  and  matrix 


G  :=  In  <g>  gf 


£  Z 


nxnk 

Q 


The  /.’-dimensional  lattice  A  (g,j  c  Zfc,  and  hence  also  the  nfc- dimensional  lattice  A  (G),  has  smoothing 
parameter  bounded  by  sg  ■  ojn,  where  sg  <  y/5  is  a  known  constant.  There  are  efficient  algorithms  that,  given 
any  desired  syndrome  u  £  Z9,  sample  from  a  discrete  Gaussian  distribution  over  the  coset  A^  (g£)  for  any 
given  parameter  s  >  ss  ■  un.  Since  A  (G)  c  Znfc  is  the  direct  sum  of  n  copies  of  A1  (g/),  discrete  Gaussian 
sampling  over  a  desired  coset  (G)  (with  parameter  s  >  ss  ■  uin)  can  be  accomplished  by  concatenating  n 
independent  samples  over  appropriate  cosets  of  A^(g4). 
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Definition  3.1  (IMP12lk  Let  m  >  nk  be  an  integer  and  define  m  =  m  —  nk.  For  A  £  Z™ xm,  we  say  that 
R  £  Z”ixn/'"  is  a  trapdoor  for  A  with  tag  H*  £  Z” x  n  //'A  =  H*  •  G.  The  quality  of  the  trapdoor  is 

defined  to  be  the  spectral  norm  si(R). 

Note  that  H*  is  uniquely  determined  and  efficiently  computable  from  R,  because  G  contains  the  n-by-n 
identity  as  a  submatrix.  Note  also  that  if  R  is  a  trapdoor  for  A  with  tag  H*,  then  it  is  also  a  trapdoor  for 
AH  :=  A  -  [0  |  HG]  with  tag  H*  —  H  £  Z£xn. 

The  key-generation  algorithm  of  llMP12l  produces  a  parity-check  matrix  A  £  Z”xm  together  with  a 
trapdoor  R  having  desired  tag  H*.  It  does  so  by  choosing  (or  being  given)  a  uniformly  random  A  £  Z”xm 
and  a  random  R  £  Zmxnfc  having  small  si(R),  and  outputs  A  =  [A  |  H*  •  G  —  AR].  For  sufficiently  large 
m  >  Cn  lg  q  (where  C  is  a  universal  constant)  and  appropriate  distribution  of  R,  the  output  matrix  A  is 
uniformly  random,  up  to  negl(n)  statistical  distance. 

The  discrete  Gaussian  sampling  algorithm  of  HMP12I  is  an  instance  of  the  “convolution”  approach 
from  f Pei  1 01.  It  works  in  two  phases: 

1.  In  the  offline  “perturbation”  phase,  it  takes  as  input  a  parity-check  matrix  A,  a  trapdoor  R  for  A  with 
some  tag  H*  £  Z”xn,  and  a  Gaussian  parameter  s  >  Cs  \  (R)  (where  C  is  some  universal  constant). 
It  chooses  one  or  more  Gaussian  perturbation  vectors  p  £  Zm  (one  for  each  future  call  to  the  online 
sampling  step)  having  non-spherical  covariance  Ep  that  depends  only  on  s  and  the  trapdoor  R. 

2.  In  the  online  “syndrome  correction”  phase,  it  is  given  a  syndrome  u  £  Z™  and  a  tag  H  £  Z”  xn.  As 
long  as  FI*  —  H  £  Z”Xn  is  invertible,  it  chooses  z  £  Znfc  having  Gaussian  distribution  with  parameter 
ss  ■  un  over  an  appropriate  coset  of  AJ-(G),  and  outputs  x  =  p  +  [^]  z  £  A^(Ah),  where  p  is  a 
fresh  perturbation  from  the  offline  step. 

Informally,  the  perturbation  covariance  Ep  of  p  is  carefully  designed  to  cancel  out  the  trapdoor-revealing 
covariance  of  y  =  [^]  z,  so  that  their  sum  has  a  (public)  spherical  Gaussian  distribution.  More  formally, 
the  output  x  has  distribution  within  negl(n)  statistical  distance  of  /JA  i  (A|I)  s.Un ,  and  in  particular  does  not 
reveal  any  information  about  the  trapdoor  R  (aside  from  an  upper  bound  s  on  si(R),  which  is  public). 

We  emphasize  that  for  security,  it  is  essential  that  none  of  the  intermediate  values  p,  z  or  y  =  [  ^]  z  be 
revealed,  otherwise  they  could  be  correlated  with  x  to  leak  information  about  the  trapdoor  R  that  could  lead 
to  an  attack  like  the  one  given  in  IINR061. 

3.2  Functionalities  for  Threshold  Sampling 

Here  we  present  ideal  functionalities  corresponding  to  the  above  two  algorithms.  The  key-generation  and 
Gaussian  sampling  functionalities  J-kg  and  Wgs  are  specified  in  Figure [T]and  Figure [2]  respectively;  they 
internally  execute  the  standalone  algorithms  described  above. 

To  realize  Zkg,  in  the  trusted  setup  model  (as  used  in  flCG99l)  we  can  simply  let  the  trusted  party  play 
the  role  of  Wkg,,  because  key  generation  is  a  one-time  setup.  To  realize  J-"gs>  for  the  purpose  of  modularity  we 
define  two  lower-level  functionalities  ^perturb  and  Jxorrect  (Figures  [3]and  |4j).  which  generate  the  perturbation 
and  syndrome-correction  components,  respectively,  as  in  the  standalone  sampling  algorithm.  The  27gs> 
-T’perturb,  and  Jxorrect  functionalities  are  all  initialized  with  a  bound  B  on  the  number  of  Gaussian  samples 
that  they  will  produce  in  their  lifetimes.  This  is  because  the  trusted  setup  (or  offline  precomputation)  phases 
of  our  protocols  need  to  prepare  sufficient  randomness  so  that  the  online  phases  can  be  noninteractive.  (If 
the  bound  B  is  reached,  then  the  parties  can  just  initialize  new  copies  of  J-gs>  -^Perturb,  and  J^onect  using  the 
same  arguments  from  J-'kg-) 
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Functionality  Ji<G 

Generate:  Upon  receiving  (gen,  sid,  A  £  Z”xm,  H*  £  Z”xn,  z)  from  at  least  h  honest  parties  in  V: 

•  Choose  R  «—  D™k  and  compute  a  sharing  [R]  over  Zg.  Let  A  =  [A  |  H*  •  G  —  AR], 

•  Send  (gen,  sid ,  A,  [R]1)  to  each  party  i  in  V,  and  (gen,  sid ,  A,  H*,  z)  to  the  adversary. 

Figure  1 :  Key  generation  functionality 

Functionality  Jgs 

Initialize:  Upon  receiving  (init,  sid,  A,  [R]4,  H* ,  s,  B)  from  at  least  h  honest  parties  i  in  V: 

•  Reconstruct  R  and  store  sid,  A,  R,  H*,  s,  and  B. 

•  Send  (init,  sid)  to  each  party  in  V,  and  (init,  sid ,  A,  H*,  s,  B)  to  the  adversary. 

Sample:  Upon  receiving  (sample,  sid ,  H  £  Z™xn,  u  £  Zg )  from  at  least  h  honest  parties  in  V,  if  H*  —  H  £ 
Z” x n  is  invertible  and  fewer  than  B  calls  to  sample  have  already  been  made: 

•  Sample  x  <—  F,aj-(ah),s  w„  using  the  algorithm  from  IMP12II  with  trapdoor  R. 

•  Send  (sample,  sid,  x)  to  all  parties  in  V,  and  (sample,  sid,  H,  u,  x)  to  the  adversary. 

Figure  2:  Gaussian  sampling  functionality 


We  next  describe  the  helper  functionalities  J-perturb  and  Jxorrecu  and  describe  how  they  can  be  realized 
efficiently  and  noninteractively  using  trusted  setup. 

3.2.1  Perturbation 

Our  perturbation  functionality  ^perturb  (Figure[3])  corresponds  to  the  offline  perturbation  phase  of  the  standalone 
sampling  algorithm.  The  perturb  command  does  not  take  any  inputs,  so  its  results  can  be  precomputed  offline, 
before  the  command  is  actually  invoked.  The  possibility  of  precomputation  introduces  one  subtlety  in  the 
definition  of  the  functionality,  however.  Notice  that  the  functionality  asks  the  adversary  for  share  values  [[p]]* 
for  the  corrupted  parties,  then  generates  shares  for  the  honest  parties  that  are  consistent  with  those  shares; 
clearly  this  does  not  affect  the  secrecy  of  p.  This  formulation  is  needed  for  proving  security  of  a  protocol  that 
precomputes  shares  of  perturbations  before  any  real  calls  to  perturb  are  made  (we  give  such  a  protocol  in 
Appendix |A.4|>.  It  allows  the  simulator  to  choose  shares  on  its  own  when  simulating  the  precomputation,  and 
ensures  that  the  functionality  later  distributes  shares  that  are  consistent  with  the  simulation.  Observe  that  with 
trusted  setup,  J-perturb  can  be  trivially  realized  by  just  precomputing  and  distributing  shares  of  B  samples  in 
the  initialization  phase,  which  the  parties  then  consume  in  the  online  phase. 

Note  that  J-perturb  distributes  shares  [[pj*  of  a  perturbation  p  to  the  players,  which  themselves  do  not  reveal 
any  information  about  p  to  the  adversary,  just  as  in  the  standalone  Gaussian  sampling  algorithm.  However,  in 
order  for  the  perturbation  to  be  useful  in  the  later  syndrome-correction  phase,  the  parties  will  need  to  know 
(and  so  -Tperturb  reveals)  some  partial  information  about  p,  namely,  the  syndromes  w  =  [A  |  —  AR]  •  p  £  Z” 
and  w  =  [0  |  G]  •  p  £  Z™ .  This  is  the  main  significant  difference  with  the  standalone  setting,  in  which  these 
same  syndromes  are  calculated  internally  but  never  revealed.  Informally,  Lemma [3T2] below  shows  that  the 
syndromes  are  uniformly  random  (up  to  negligible  error),  and  hence  can  be  simulated  without  knowing  p. 
Furthermore,  p  will  still  be  a  usable  perturbation  even  after  w,  w  are  revealed,  because  it  has  an  appropriate 
(non-spherical)  Gaussian  parameter  which  sufficiently  exceeds  the  smoothing  parameter  of  an  appropriate 
lattice.  This  fact  will  be  used  later  in  the  proof  of  security  for  our  Bqs  realization. 
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Functionality  J^ertuib 

Initialize:  Upon  receiving  (init,  sid ,  Ah*  =  [A  |  —  AR],  |R]4,  s,  B )  from  at  least  h  honest  parties  i  in  V: 

•  Reconstruct  R  to  compute  covariance  matrix  Ep  =  s2  —  s2  [Ijf]  [r*  i]  and  store  sid,  A  h*  ,  and  Ep. 

•  Send  (init,  sid)  to  all  parties  in  V  and  (init,  sid,  A_h*  ,  s,  B)  to  the  adversary. 

Perturb:  Upon  receiving  (perturb,  sid )  from  at  least  h  honest  parties  in  V,  if  fewer  than  B  calls  to  perturb  have 
already  been  made: 

•  Choose  p  -s—  D„m  /=- 

r  :WYjp-ujn 

•  Compute  w  =  A_H*  •  p  E  Z"  and  w  =  [0  |  G]  •  p  €  Z™. 

•  Send  (perturb,  sid,  w,  w)  to  the  adversary,  and  receive  back  shares  fpfl4  E  Z”'  for  each  currently 
corrupted  party  i  in  V. 

•  Generate  a  uniformly  random  sharing  [p]  consistent  with  the  shares  received  in  the  previous  step. 

•  Send  (perturb,  sid,  [p]4,  w,  w)  to  each  party  i  in  V. 

Figure  3:  Perturbation  functionality 


Lemma  3.2.  Let  A  E  Z  be  uniformly  random  for  m  =  m  —  nk  >  nlgq  +  to  (log  n),  let 


B  = 


A  -AR 
G 


=  (A  ©  G)  [*  -R]  E  z2qnx^+nk^ 


(where  0  denotes  the  direct  sum),  and  let  A  =  AJ-(B).  Then  with  all  but  negl(n)  probability  over  the  choice 
of  A,  we  have  rje(  A~L(B))  <  v/5('Si(R)  +  1)  •  l on  for  some  e  =  negl(n). 

In  particular,  for  p  ■(—  D^m  where  >  6(si(R)  +  1 )  ■  ojn  >  2r/e(A±(B)),  the  syndrome 

u  =  (w,  w)  =  Bp  E  Z^n  is  negl {n) -far  from  uniform,  and  the  conditional  distribution  of  p  given  u  is 
DAu  (B).\/Sp' 


Proof  By  Lemma  2.4  we  have  r/f/  (A  (A))  <  2  •  ujn  (with  overwhelming  probability)  for  some  e'  =  negl(n). 


Also  as  shown  in  1MP121.  we  have  t7e/(A-L(G))  <  y/h  ■  utn  (see  Section  3.1 1.  This  implies  that 

?ye(AJ-(A  ©  G))  <y/b-ujr, 


where  (1  +  e)  =  (1  +  e')2,  and  in  particular  e  =  negl(rt). 

Since  T  =  is  unimodular  with  inverse  T  1  =  [ 1  ^  ,  it  is  easy  to  verify  that  A  (B)  = 

T_1  •  AJ-(A  0  G),  and  hence 

r?e(A±(B))  <  ^(T'1)  •  77e(A±(A0  G))  <  \/5(si(R)  +  1)  ■  ojn.  □ 


3.2.2  Syndrome  Correction 

Our  functionality  J-borrect  (Figure  |4])  corresponds  to  the  syndrome-correction  step  of  the  standalone  sampling 
algorithm.  Because  its  output  y  must  lie  in  a  certain  coset  A^(A),  where  v  depends  on  the  desired  final 
syndrome  u,  the  functionality  must  be  invoked  online.  As  indicated  in  the  overview,  the  standalone  algorithm 
samples  z  <—  Av  (G)  and  defines  y  =  [^]  z.  The  functionality  does  the  same,  but  outputs  only  shares  of  y 
to  their  respective  owners.  This  ensures  that  no  information  about  y  is  revealed  to  the  adversary.  (Note  that 
the  input  syndrome  v  itself  is  not  revealed  in  the  standalone  algorithm,  but  in  our  setting  v  is  determined 
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Functionality  Zq„tcci 

Initialize:  Upon  receiving  (init,  sid ,  JR]*,  B)  from  at  least  h  honest  parties  i  in  V: 

•  Reconstruct  R  and  store  sid,  R,  and  B. 

•  Send  (init,  sid)  to  all  parties  in  V  and  (init,  sid ,  B)  to  the  adversary. 

Correct:  Upon  receiving  (correct,  sid ,  v)  from  at  least  h  honest  parties  in  V ,  if  fewer  than  B  calls  to  correct 
have  already  been  made: 

•  Sample  z  t—  and  compute  y  =  [^]  z. 

•  Send  (correct,  sid ,  v)  to  the  adversary,  receive  shares  [y]1  £  Z™  for  each  currently  corrupted  party  i, 
and  generate  a  uniformly  random  sharing  [y]  consistent  with  these  shares. 

•  Send  (correct,  sid ,  [y]*)  to  each  party  i  in  V . 

Figure  4:  Syndrome  correction  functionality 


entirely  by  public  information,  so  it  is  known  to  the  adversary.)  Also,  just  like  J-perturb,  the  functionality  asks 
the  simulator  for  shares  for  the  corrupted  parties,  to  make  precomputation  simulatable. 

Realizing  ^-Correct  with  a  noninteractive  protocol  relies  crucially  on  the  parallel  and  offline  nature  of  the 
corresponding  step  in  the  standalone  algorithm.  In  particular,  we  use  the  fact  that  without  knowing  v  in 
advance,  the  algorithm  can  precompute  partial  samples  for  each  of  the  q  =  poly(n)  scalar  values  v  £  7Lq, 
and  then  linearly  combine  n  such  partial  samples  to  answer  a  query  for  a  full  syndrome  v  £  Z”. 

In  the  trusted  setup  model,  the  protocol  realizing  J-borrect  is  as  follows. 

1.  In  the  offline  phase,  a  trusted  party  uses  the  trapdoor  R  (with  tag  H*)  to  distribute  shares  as  follows. 
For  each  j  £  [n]  and  v  £  Zg,  the  party  initializes  queues  Ql-  v  for  each  party  i,  does  the  following  B 
times,  and  then  gives  each  of  the  resulting  queues  Q*  v  to  party  i. 

•  Sample  zj)V  <-  DA±( gt)>Sg.Wn. 

•  Compute  yj)V  =  \  ^  (ej  ®  z/:„),  where  c?  £  Z"  denotes  the  jth  standard  basis  vector.  Note  that 

Ah  ■  y j,v  =  (H*  -  H)G  •  (e,  ®  zj>v)  =  (H*  -  H)(u  ■  ej), 


where  as  always,  Ah  =  A  —  [0  |  HG]  for  any  H  £  Z”xn. 

•  Generate  a  sharing  for  y j>v,  and  add  *  to  queue  Ql-  v  for  each  party  i  £  V. 

2.  In  the  online  phase,  upon  receiving  (correct,  sid,  v),  each  party  i  dequeues  an  entry  \yj.v  1 1  from 
Qj.Vj  for  each  j  £  [n],  and  locally  outputs  [yj*  =  ffjeln] Iyj,?!jl?-  Note  that  by  linearity  and  the 
homomorphic  properties  of  secret  sharing,  the  shares  [[y]]*  recombine  to  y  =  [^]  z  £  Zm  for  some 
Gaussian-distributed  z  of  parameter  sg  •  ujn,  such  that  Ah  •  y  =  (H*  —  H)  •  v  £  Z”. 


Without  trusted  setup,  we  give  in  Appendix  A.3  an  efficient  protocol  for  Jxon-ect  that  operates  in  a 
similar  way,  populating  the  local  queues  Ql-  v  during  the  offline  phase  in  a  distributed  manner  using  standard 
share-blinding  and  multiplication  functionalities,  among  others. 


3.2.3  Legal  Uses  of  the  Functionalities 

Putting  the  key-generation  and  Gaussian  sampling  operations  into  separate  functionalities  and  J-gs  (and 
^DeiTrap  for  delegation,  in  Section  [Tdjbclow),  and  realizing  Fqs  using  the  helper  functionalities  J-pei-turb  and 
^Correct,  aids  modularity  and  simplifies  the  analysis  of  our  protocols.  However,  as  a  side  effect  it  also  raises  a 
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technical  issue  in  the  UC  framework,  since  environments  can  in  general  provide  functionalities  with  arbitrary 
inputs,  even  on  behalf  of  honest  users.  The  issue  is  that  the  functionalities  are  all  designed  to  be  initialized 
with  some  common,  valid  state — namely,  shares  of  a  trapdoor  R  for  a  matrix  A  as  produced  by  on  valid 
inputs — but  it  might  be  expensive  or  impossible  for  the  corresponding  protocols  to  check  the  consistency  and 
validity  those  shares.  Moreover,  such  checks  would  be  unnecessary  in  the  typical  case  where  an  application 
protocol,  such  as  a  threshold  signature  scheme,  initializes  the  functionalities  as  intended^] 

Therefore,  we  prove  UC  security  for  a  restricted  class  of  environments  3?  that  always  initialize  our 
functionalities  with  valid  arguments.  In  particular,  environments  in  .2T  can  instruct  parties  to  instantiate  Tkg 
only  with  appropriate  arguments  A,  z.  Similarly,  J-qs  (and  J-beiTrap)  can  be  initialized  only  with  a  matrix  A, 
tag  H*,  and  shares  of  a  trapdoor  R  matching  those  of  a  prior  call  to  the  gen  command  of  Fylg,  and  with  a 
sufficiently  large  Gaussian  parameter  s  >  Cs\  ■  un,  where  si  is  a  high-probability  upper  bound  on  s\  (R)  for 
the  trapdoor  R  generated  by  J-kg-  The  functionalities  TArtui-h  and  Jxorrect  (which  in  any  case  are  not  intended 
for  direct  use  by  applications)  also  must  be  initialized  using  a  prior  output  of  J-kg- 

More  formally,  an  environment  Z  is  said  to  be  in  the  class  '¥  if  it  satisfies  the  following  conditions 
specific  to  our  functionalities: 

•  When  Z  instructs  honest  parties  to  run  the  gen  command  of  the  input  matrix  A  e  Z”xm 
and  parameter  z  must  correspond  with  a  statistically  secure  instantiation  of  the  trapdoor  generator 
from  HMP12II.  Concretely,  A  must  be  statistically  close  to  uniformly  random,  and  the  Gaussian 
parameter  z  and  dimension  fh  must  jointly  be  sufficiently  large.  As  shown  in  |[MP12i|,  one  valid 
instantiation  is  to  let  z  >  2ujn  and  fh  >  Cn  lg  q  for  any  fixed  constant  C  >  1. 

•  When  Z  instructs  honest  parties  to  run  the  init  commands  of  J-gs>  -Tpertul-b,  or  /'corrects  the  matrix  A, 
tag  H*,  and  shares  [R]]1  provided  as  the  parties’  inputs  must  match  those  of  a  prior  call  to  ^KG-Sen-  hi 
addition,  these  init  commands  must  all  use  the  same  Gaussian  parameter  s,  which  must  be  sufficiently 
large  relative  to  the  Gaussian  parameter  z  and  dimension  fh  used  in  that  call  to  J^G-gen.  Specifically, 
we  require  s  >  Cz(\/ffi  +  y/n  log  q)  ■  ujn  for  a  certain  universal  constant  C.  By  the  results  of  BMP  121. 
this  guarantees  that  with  overwhelming  probability  over  the  choice  of  R,  we  have  s  >  C  s\  (R)  •  un 
for  some  universal  constant  C',  which  ensures  that  our  7Tgs  protocol  produces  the  proper  distribution. 

We  emphasize  that  these  restrictions  on  the  environment  are  not  actually  limiting  in  any  meaningful  way, 
since  our  functionalities  are  only  intended  to  serve  as  subroutines  in  higher-level  applications,  e.g.  threshold 
signatures  and  (H)IBE.  When  designing  a  protocol  <f>  that  uses  these  functionalities  (see,  e.g.,  Section  [4])  one 
simply  needs  to  ensure  that  <fi  does  so  in  a  manner  consistent  with  the  above  conditions.  Then  composing  d 
with  our  protocols  7Tkg>  ttgs.  7rperturb>  and  7rcorrect  (which  we  prove  secure  against  environments  in  will 
yield  a  secure  protocol  against  any  i-limited  environment. 

3.3  Gaussian  Sampling  Protocol 

Figure[5]defines  a  protocol  7Tgs  that  realizes  the  Gaussian  sampling  functionality  J-gs  in  the  (Jrpertuib)  ^Correct )- 
hybrid  model.  Its  sample  command  simply  makes  one  call  to  each  of  the  main  commands  of  -Tperturb  and 
J7Correct>  adjusting  the  requested  syndrome  as  necessary  to  ensure  that  the  syndrome  of  the  final  output  is 
the  desired  one.  (This  is  done  exactly  as  in  the  standalone  algorithm.)  The  shares  of  the  perturbation  p  and 
syndrome-correction  term  y  are  then  added  locally  and  announced,  allowing  the  players  to  reconstruct  the 
final  output  x  =  p  +  y.  The  security  of  7Tgs  is  formalized  in  Theorem |3 .3 1  and  proved  via  the  simulator  2?gs 
in  Figure  [6j 

’This  issue  is  not  limited  to  our  setting,  and  can  arise  any  time  the  key-generation  and  secret-key  operations  of  a  threshold  scheme 
are  put  into  separate  functionalities.  We  note  that  using  "joint  state”  ICR03;I  does  not  appear  to  resolve  the  issue,  because  it  only 
allows  multiple  instances  of  the  same  protocol  to  securely  share  some  joint  state. 
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An  essential  point  is  that  given  the  helper  functionalities,  the  protocol  7Tgs  is  completely  noninteractive, 
i.e.,  no  messages  are  exchanged  among  the  parties,  except  when  broadcasting  their  shares  of  the  final  output. 
Similarly,  recall  that  our  realizations  of  ^perturb  and  ^correct  are  also  noninteractive,  either  when  using  trusted 
setup  or  offline  precomputation  (see  Appendix  [A]).  In  other  words,  in  the  fully  realized  sampling  protocol,  the 
parties  can  sample  from  any  desired  coset  using  only  local  computation,  plus  one  broadcast  of  the  final  output 
shares.  We  emphasize  that  this  kind  of  noninteractivity  is  nontrivial,  because  the  number  of  possible  cosets  is 
exponentially  large. 

Protocol  7Tgs  in  the  (-Fperturbi  J-correct)-hybrid  model 

Initialize:  On  input  (init,  sid,  A,  [R]4,  H*,  s,  B),  party  i  stores  H*,  calls  J-perturb(init,  sid,  A_h*  ,  [R]l,s,R) 
and  ^Correct (iri it,  sid,  [R]4,  B),  and  outputs  (init,  sid). 

Sample:  On  input  (sample,  sid,  H,  u),  if  H*  -  He  Z™ xn  is  invertible,  and  if  fewer  than  B  calls  to  sample  have 
already  been  made,  then  party  i  does: 

•  Call  Jrperturb( perturb,  sid)  and  receive  (perturb,  sid,  [p]4,  w,  w). 

•  Compute  v  =  (H*  —  H)-1(u  —  w)  —  w  G  Z™. 

•  Call  ^correct (correct,  sid,  v)  and  receive  (correct,  sid,  [y]4). 

•  Broadcast  [x]4  =  [p]4  +  [y]4  and  reconsttuct  x  =  p  +  y  from  the  announced  shares. 

•  Output  (sample,  sid,  x). 

Figure  5 :  Gaussian  sampling  protocol 


Simulator  Sqs 

Initialize:  Upon  receiving  (init,  sid,  A,  H*,  s,  B)  from  Tqs,  reveal  to  Z  (init,  sid)  as  outputs  of  both  J'perturb 
and  ^Correct  to  each  currently  corrupted  party  and  any  party  that  is  corrupted  in  the  future. 

Sample:  Upon  receiving  (sample,  sid,  H,  u,  x)  from  Bgs- 

•  Choose  uniform  and  independent  w,  w  G  Z”  and  compute  v  =  (H*  —  H)-1(u  w)  ■  w  G  Z4'. 

•  On  behalf  of  J'perturb,  send  (perturb,  sid ,  w,  w)  to  Z  and  receive  back  shares  [p]4  for  each  currently 
corrupted  party  i  in  V .  Generate  a  uniformly  random  sharing  [p]  of  p  =  0  consistent  with  these 
shares.  Send  (perturb,  sid,  [p]4,  w,  w)  to  each  corrupted  party  i  in  V  on  behalf  of  -Fperturb- 

•  On  behalf  of  ^Correct,  send  (correct,  sid,  v)  to  Z  and  receive  back  shares  [y]4  for  each  currently 
corrupted  party  i  in  V .  Generate  a  uniformly  random  sharing  [y]  of  y  =  x  consistent  with  these 
shares.  Send  (correct,  sid,  [y]4)  to  each  corrupted  party  i  in  V  on  behalf  of  J-com,ct. 

•  Broadcast  [x]  4  =  [p]4  +  [y]  4  on  behalf  of  each  honest  party  i. 

Corruption:  When  Z  requests  to  corrupt  party  i,  for  each  previous  call  to  sample,  reveal  the  corresponding 
messages  (perturb,  sid,  |p]4,w,w)  and  (correct,  sid,  [y]4)  to  party  i  on  behalf  of  J'perturb  and  -Fcorrecti 
respectively. 

Figure  6:  Simulator  for  7Tgs 


Theorem  3.3.  Protocol  ttqs  statistically  realizes  J-qs  in  the  ( J-pfriurb,  correct)- hybrid  model  for  t-limited 
environments  in  2P. 

Proof  sketch.  The  simulator  Sos  in  Figure  |6]maintains  consistent  sharings  of  p  =  0  and  y  =  x  for  each  call 
to  sample,  and  it  releases  player  z’s  shares  of  these  values  (on  behalf  of  Wperturb  and  Jxorrect)  upon  corruption 
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of  player  i.  The  fact  that  p  and  y  in  5qs  are  from  incorrect  distributions  is  not  detectable  (even  statistically) 
by  the  environment  Z,  because  it  sees  at  most  t  shares  of  each,  and  the  shares  are  consistent  with  announced 
shares  of  x  =  p  +  y. 

The  only  other  significant  issues  relate  to  (1)  the  syndromes  w,  w  output  publicly  by  -Fperturb  in  the 
(-^Perturb)  -^Correct) -hybrid  world,  versus  the  simulator’s  choices  of  those  values  on  behalf  of  -Tperturb  in  the 
ideal  world;  and  (2)  the  distribution  (conditioned  on  any  fixed  w,  w)  of  the  final  output  x  in  both  worlds. 
For  item  (1),  as  proved  in  Lemma  |T2|  in  the  hybrid  world  the  syndromes  w,  w  are  jointly  uniform  and 
independent  (up  to  negligible  statistical  distance)  over  the  choice  of  p  by  J-perturb,  just  as  they  are  when 
produced  by  the  simulator.  Moreover,  conditioned  on  any  fixed  values  of  w,  w,  the  distribution  of  p  in  the 
hybrid  world  is  a  discrete  Gaussian  with  covariance  Sp  over  a  certain  lattice  coset  A„  (B),  and  the  actual 
value  of  p  from  this  distribution  is  perfectly  hidden  by  the  secret-sharing  scheme. 

For  item  (2),  the  above  facts  imply  that  in  the  hybrid  world,  x  =  p  +  y  has  spherical  discrete  Gaussian 
distribution  DaJ-(ah),s>  just  as  the  output  x  of  J-gs  does  in  the  ideal  world  (up  to  negligible  statistical  error 
in  both  cases).  The  proof  is  essentially  identical  to  that  of  the  “convolution  lemma”  from  HMP121.  which 
guarantees  the  correctness  of  the  standalone  sampling  algorithm  (as  run  by  in  the  ideal  world).  The  only 
slight  difference  is  that  in  the  hybrid  world,  p’s  distribution  (conditioned  on  any  fixed  values  of  w,  w)  is 
a  discrete  Gaussian  with  parameter  X/Y^,  over  a  coset  of  A-*-(B),  instead  of  over  Zm  as  in  the  standalone 


algorithm.  Fortunately,  Lemma  3.2  says  that  >  2r/f  (A  (B ) ) ,  and  this  is  enough  to  adapt  the  proof 


from  BMP121  to  the  different  distribution  of  p. 

Finally,  by  the  homomorphic  properties  of  secret  sharing,  the  shares  [[p]*  +  [y]1  announced  by  the  honest 
parties  are  jointly  distributed  exactly  as  a  fresh  sharing  of  x  as  produced  by  the  simulator.  We  conclude  that 
the  hybrid  and  real  views  arc  statistically  indistinguishable,  as  desired.  J3| 


3.4  Trapdoor  Delegation 

The  trapdoor  delegation  functionality  -TbeiTrap  given  in  Figure [7j corresponds  to  the  algorithm  DelTrap  for 
delegating  a  lattice  trapdoor  in  HMP12L  which  is  used  in  hierarchical  IBE  schemes.  The  functionality  is 
initialized  with  shares  of  a  trapdoor  R  for  some  A  G  Z”xm.  For  an  extended  matrix  A'  =  [Ah  |Ai]  <E 

Zq  x  (rn+nk)  (where  Ah  =  A  —  [0  j  HG])  and  tag  H'  G  Z”xn,  -TbeiTrap  outputs  shares  of  a  trapdoor  R' 
for  A'  with  tag  FT,  where  the  distribution  of  R'  is  Gaussian  and  in  particular  is  independent  of  R. 

A  realization  of  J-beiTrap  in  the  ^gs -hybrid  model  is  given  by  7TDeiTrap  in  Figure  [8}  It  is  entirely  straight¬ 
forward,  since  the  standalone  algorithm  from  IIMP1 2H  just  draws  several  Gaussian  samples  over  appropriate 
(publicly  computable)  cosets  of  AJ~(A),  so  we  omit  the  proof  of  security. 


4  Threshold  Signatures  and  IBE 

Here  we  apply  our  protocols  in  a  straightforward  manner  to  give  threshold  versions  of  the  signature  and  identity- 
based  encryption  schemes  from  HGPV081I .  Other  signature  and  (H)IBE  schemes  that  use  key-generation  and 
Gaussian  sampling  as  “black  boxes”  can  be  similarly  adapted  to  the  threshold  setting. 

The  GPV  schemes.  For  security  parameter  n,  modulus  q  and  message  space  Ad,  the  GPV  signature  scheme 
uses  a  hash  function  H :  M.  — >  Z”,  which  is  modeled  as  a  random  oracle,  and  two  algorithms  GenTrap  and 
SampleD.  At  a  high  level,  GenTrap(ra,  q.  m )  generates  a  nearly  uniform  matrix  A  G  Z",xm  together  with 
a  trapdoor  R.  Using  these,  SampleD(A,  R,  u,  s)  generates  a  Gaussian  sample  (for  any  sufficiently  large 
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Functionality  Jbem-ap 

Initialize:  Upon  receiving  (init,  sid,  A,  [R]1,  H*,  s,  B)  from  at  least  h  honest  parties  in  V: 

•  Reconstruct  trapdoor  R  and  its  invertible  tag  H  for  A,  and  store  sid,  A,  R,  H*  and  s. 

•  Send  (init,  sid )  to  each  party  in  V,  and  (init,  sid ,  A,  s,  B)  to  the  adversary. 

Delegate:  Upon  receiving  (delegate,  sid ,  H,  Ai,  H')  from  at  least  h  honest  parties  in  V: 

•  If  H*  —  H  E  Z”xrl  is  invertible,  using  the  Gaussian  sampling  algorithm  from  IMPI  21  with  trapdoor 
R,  sample  each  column  of  R'  independently  from  a  discrete  Gaussian  with  parameter  s  over  the 
appropriate  coset  of  A1(Ah),  so  that  Ah  •  R'  =  H'  •  G  —  Ap. 

•  Compute  a  sharing  JR']  over  Zq,  and  send  (delegate,  sid,  JR']1)  to  each  party  i,  and 
(delegate,  sid ,  H,  Ap,  H')  to  the  adversary. 

Figure  7 :  Functionality  for  delegating  a  lattice  trapdoor 
Protocol  7rDeiTrap  in  the  Acs -hybrid  model 

Initialize:  On  input  (init,  sid ,  A,  [R]\  H*,  s,  B),  call  Jrcs(init,  sid,  A,  [R]\  H*,  s,  Bnk). 

Delegate:  On  input  (delegate,  sid ,  H,  Ap,  H'),  party  i  does  the  following: 

•  Foreachj  =  1, ...,  nk,  call  .Fes (sample,  sid,  H,  u)  where  u  is  the  jth  column  of  H'G  —  Ap;  receive 
(sample,  sid,  r ' )  from  Fes  and  let  r'  be  the  jth  column  of  R'. 

•  Output  (sample,  sid,  R'). 

Figure  8:  Protocol  for  delegating  a  lattice  trapdoor 


parameter  s )  over  the  lattice  coset  (A).  Ignoring  the  exact  selection  of  parameters,  the  stateful  version  of 
the  signature  scheme  consists  of  the  following  three  algorithms: 

•  KeyGen(ln):  Let  (A,  R)  -t—  GenTrap(n,  q,  m)  and  output  verification  key  vk  =  A  and  signing  key 

sk  =  R. 

•  Sign (sk,fx  E  Ad):  If  (//.  o)  is  already  in  local  storage,  output  the  signature  a.  Otherwise,  let  x 

SampleD(A,  R,  s )  and  store  (/x,  a).  Output  the  signature  a  =  x. 

•  Verify(n/c,  /x,  a  =  x):  If  Ax  =  H( m)  and  x  is  sufficiently  short,  then  accept;  otherwise,  reject. 

See  1GPV08I  for  the  proof  of  (strong)  unforgeability  under  worst-case  lattice  assumptions. 

In  the  GPV  identity-based  encryption  scheme,  the  setup  algorithm  is  the  same  as  KeyGen  above,  and 
the  master  public  and  secret  keys  are  simply  the  verification  and  signing  keys  above.  The  secret  key  for  an 
individual  identity  is  a  signature  on  that  identity.  Since  we  are  concerned  only  with  thresholdizing  the  signing 
and  key-extraction  algorithms,  the  details  of  the  encryption  and  decryption  algorithms  are  unchanged  and 
irrelevant  here,  so  we  need  only  give  threshold  version  of  KeyGen  and  Sign. 

Thresholdizing.  In  order  to  obtain  a  threshold  signature  scheme,  KeyGen  and  Sign  must  be  done  in  a 
distributed  way,  so  that  the  signing  key  sk  =  R  is  distributed  among  the  participating  parties  and  a  valid 
signature  a  can  only  be  produced  by  a  quorum  of  participating  parties.  In  Figure  [9]  we  recall  (from  [[ADN06D 
a  formal  functionality  for  threshold  signatures.  (Recall  that  V  is  the  set  of  trustees,  or  parties  authorized  to 
receive  shares  of  the  signing  key.) 
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Functionality  JYsig 

Generate:  Upon  receiving  (gen,  sid ,  B)  from  at  least  h  honest  parties  in  V ,  send  (gen,  sid,  B )  to  the  adversary, 
receive  and  record  verification  key  v,  and  send  (gen,  sid ,  v)  to  each  party  in  V . 

Sign:  Upon  receiving  (sign,  sid ,  m)  from  at  least  h  honest  parties  in  V,  if  fewer  than  B  calls  to  sign  have  already 
been  made: 

•  Send  (sign,  sid ,  to)  to  the  adversary  and  receive  signature  a. 

•  If  there  is  no  record  of  (to,  cr,  v,  0),  record  (to,  a,  v,  1)  and  send  (sign,  sid,  to,  a)  to  each  party  in  V. 
Verify:  Upon  receiving  (ver,  sid,  m,  cr,  v')  from  any  party  p  €  V: 

•  Send  (ver,  sid,  to,  a,  v')  to  the  adversary  and  receive  (ver,  sid,  m,  (f>). 

•  If  v'  =  v  and  (to,  a,  v,  1)  is  recorded,  then  send  (ver,  sid,  to,  1)  to  p. 

•  If  v'  =  v  and  there  is  no  recorded  (to,  cr' ,  v,  1),  then  record  (to,  cr,  v,  0)  and  send  (ver,  sid,  to,  0)  to  p. 

•  If  some  (m,  a,  v' ,  1)  is  recorded,  then  send  (ver,  sid,  to,  1)  to  p. 

•  If  some  (m,  a,  v' ,  0)  is  recorded,  then  send  (ver,  sid,  to,  0)  to  p. 

•  Otherwise,  record  (to,  a,  v' ,  <j>)  and  send  (ver,  sid ,  to,  a,  </>)  to  p. 

Figure  9:  Threshold  signature  functionality 


To  construct  a  protocol  for  threshold  GPV  signatures  we  need  threshold  analogues  of  GenTrap  and 
SampleD;  these  are  the  functionalities  J-kg  and  Fq s  (from  Section [3]),  respectively.  J-'kg  produces  A  as 
usual,  but  each  party  i  receives  a  share  [fR.]'  of  the  trapdoor.  To  produce  a  signature,  each  party  i  in  a 
quorum  of  signers  simply  calls  -sample  with  his  share  [R]\  and  this  allows  them  to  collectively  produce 
a  signature  cr. 

In  Figure[lO]we  present  a  protocol  for  threshold  GPV  signatures  in  the  (J-kg>  Tgs (-hybrid  model.  Note 
that  it  obeys  all  the  constraints  on  the  usage  of  J^g  and  Fqs  described  in  Section[3.2.3|  Its  security  is  easily 


proved  using  the  correspondence  between  Fkg  and  KeyGen,  and  Fes  and  Sign,  so  we  state  Theorem  4.1 
without  proof. 


Theorem  4.1.  The  protocol  TTrhreshGPV  securely  realizes  Frsig,  assuming  the  unforgeability  of  the  GPV 
signature  scheme  (with  the  same  parameters)  under  chosen-message  attacks. 
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A  Protocols  Without  Trusted  Setup 


Here  we  show  how  to  realize  threshold  key  generation  Tkg  and  discrete  Gaussian  sampling  Jx;$  without 


relying  on  any  trusted  setup.  Given  the  protocol  in  Section  3.3  for  J-qs  it  suffices  to  realize  J-correct  and 
-^Perturb-  The  rest  of  the  section  is  organized  as  follows: 


In  Section  A.  1  we  formalize  three  low-level  utility  functionalities  T^Biind,  -Tiviuib  and  JFs ampZ  used  by 
several  of  our  protocols,  and  we  describe  how  these  functionalities  arc  realized. 


In  Section  A.2  we  give  a  protocol  realizing  J-kg  using  the  utility  functionalities.  This  simple  protocol 


and  security  analysis  are  representative  of  the  techniques  we  use  (in  more  complex  ways)  in  later 
protocols  as  well. 


In  Section  A.3  we  give  a  realization  of  7-correct  that  uses  an  additional  utility  functionality  T-badget* 


which  we  define  and  realize  there. 


Finally,  in  Section  A.4  we  realize  7-perturb  using  a  simple  extension  of  T sampZ*  which  we  also  realize 
there. 


A.l  Utility  Functionalities 

We  first  present  the  low-level  utility  functionalities. 


Blinding.  The  blinding  functionality  J^iind  (Figure[TT]l  simply  accepts  shares  of  some  value  over  an  arbitrary 
additive  group  G,  and  distributes  fresh  shares  of  the  same  value.  Our  later  protocols  will  use  blinding  and  the 
homomorphic  properties  of  secret  sharing  to  reveal  the  values  of  shared  secrets  modulo  lattices,  and  nothing 
more. 

Realizations  of  J^iind  in  various  communication  models  are  standard.  For  example,  to  realize  it  against 
semi-honest  corruptions  with  private  channels  is  very  simple:  simply  add  sufficiently  many  player-generated 
sharings  of  0  to  the  original  shares.  For  malicious  corruptions  one  can  use,  e.g.,  subprotocols  of  the 
BGW  1BGW88 ,  AFRIT)  or  RB  IIRB891  protocols,  which  run  in  a  constant  number  of  rounds.  We  leave  the 
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Functionality  J^iind 

Blind:  Upon  receiving  (blind,  sid ,  [x]*  £  G )  from  at  least  h  honest  parties  in  V: 

•  Reconstruct  x  and  generate  a  fresh  sharing  [$/]  (over  G)  of  y  x. 

•  Send  (blind,  sid ,  [j/]1)  to  each  party  i  in  V ,  and  (blind,  sid ,  G)  to  the  adversary. 

Figure  1 1 :  Blinding  functionality 


implementation  of  J-niind  unspecified  and  simply  work  in  the  J^iind-hybrid  model  where  needed.  We  remark 
that  our  protocols  use  J-niind  only  during  initialization,  so  the  interaction  required  to  implement  it  is  limited  to 
the  offline  phase. 

Multiplication.  The  multiplication  functionality  J-Muit  (Figure [l2|)  takes  shares  of  two  values  x,  y  in  a  ring 
1iqd  and  returns  fresh  shares  of  their  product  x  ■  y  (modulo  qd)  to  the  respective  parties.  By  the  homomorphic 
properties  of  secret  sharing,  this  generalizes  immediately  (via  local  computation  alone)  to  products  of  vectors 
and/or  matrices  X,  Y,  so  we  write  the  functionality  to  support  this  more  general  capability.  To  realize  J-Muit 
one  can  use  any  statistically  secure  protocol,  such  as  the  constant-round  protocols  of  HBGW881,  RB89il  ALR111: 
we  leave  this  choice  unspecified  and  simply  work  in  the  J-Muit -hybrid  model  where  needed. 

Functionality  JTduit 

Multiply:  Upon  receiving  (mult,  sid ,  [X]1  £  [YJ®  £  Z^w)  from  at  least  h  honest  parties  i  in  V: 

•  Reconstruct  X,  Y  from  the  shares  [X]*,  [Y]  '  ,  respectively. 

•  Generate  a  fresh  sharing  [Z]  of  Z  =  X  •  Y  £  Z^w . 

•  Send  (mult,  sid,  [Z]1)  to  each  party  i  in  V ,  and  (mult,  sid,  h  x  £,  t  x  w,  d)  to  the  adversary. 

Figure  12:  Multiplication  functionality 


Sampling  integers.  Several  of  our  protocols  rely  on  a  low-level  functionality  J-sampZ  (Figure  13 1  for 
sampling  discrete  Gaussians  over  the  integers  Z.  At  a  high  level,  the  sample  command  produces  shares  of 
a  discrete  Gaussian  variable  x  £  Z  with  a  given  parameter,  where  the  sharing  is  over  the  additive  group 
Z qd,  (i.e.,  with  d  digits  of  precision),  and  distributes  these  shares  [a;]]*  to  the  respective  parties.  Later  on  in 
we  will  extend  J-"sampZ  with  some  additional  commands,  but  for  now  we  only  need  the  sample 


A.4.1 


command. 


Functionality  T'sampz 

Sample:  Upon  receiving  (sample,  sid ,  h  x  w,z,  d )  from  at  least  h  honest  parties  in  V : 

•  Sample  X  4—  and  generate  a  fresh  sharing  [X]  over  Zqd . 

•  Send  (sample,  sid,  [X]*)  to  each  party  i  in  V  and  (sample,  sid,  h  x  w,z,  d)  to  the  adversary. 

Figure  13:  Simplified  integer  sampling  functionality 


Because  we  do  not  know  of  any  highly  efficient  algorithms  for  sampling  discrete  Gaussians,  our  realization 
uses  the  general  “inverse  transform”  sampling  algorithm,  and  implements  it  securely  using  multiparty 
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computation  tools.  Recall  that  inverse  sampling  involves  a  close  approximation  of  the  cumulative  distribution 
function.  To  sample,  one  chooses  a  uniformly  random  u  £  [0,1)  and  looks  up  the  corresponding  output  value 
in  the  table.  An  arithmetic  circuit  implementing  this  algorithm  can  be  written  as  an  AND  of  several  interval 
tests  on  the  input  u,  so  the  depth  of  the  circuit  is  roughly  the  precision  (number  of  digits)  of  the  entries  in 
the  lookup  table,  and  the  width  of  the  circuit  is  roughly  the  number  of  entries  in  the  table.  (Other  trade-offs 
between  depth  and  width  are  possible  as  well.) 

Importantly,  we  can  implement  the  inverse  sampling  method  for  discrete  Gaussians  using  a  table  of 
size  proportional  to  q  =  poly(n)  for  very  large  parameters  z,  even  though  the  distribution  has  support 
size  proportional  to  z.  This  is  because  for  z  £  \qf  yJ+l ),  the  discrete  Gaussian  of  parameter  z  can  be 
decomposed  using  Lemma  2.5  as  a  convolution  of  j  discrete  Gaussians  over  qi Z,  qr  1 Z, . . . ,  Z  having 
respective  parameters  roughly  z,z/q, . . . ,  z / q:t .  (Note  that  each  parameter  can  be  chosen  to  be  larger  than 
the  smoothing  parameter  of  the  respective  lattice.)  Each  of  these  distributions  is  highly  concentrated  on  only 
poly(n)  outputs. 

Finally,  we  emphasize  that  our  higher-level  protocols  use  AsampZ  only  in  their  key -generation  or  initializa¬ 
tion  phases,  and  only  with  fixed,  public  Gaussian  parameters,  so  any  inefficiencies  in  a  realization  of  AsampZ 
are  limited  to  the  offline  phase. 


A.2  Realizing  T Kg 

The  protocol  7Tkg  (Figure[l4j)  realizing  Jkg  in  the  (Jmiind,  ^Sampz)' -hybrid  model  is  straightforward,  given 
the  homomorphic  properties  of  the  secret-sharing  scheme  and  the  simple  operation  of  the  standalone  trapdoor 
generator,  which  just  multiplies  a  public  uniform  matrix  A  with  a  secret  Gaussian-distributed  matrix  R.  The 
parties  first  get  shares  of  a  Gaussian-distributed  trapdoor  R  using  ampz,  then  announce  blinded  shares  of 
Ai  =  —  AR  mod  q  and  reconstruct  Ai  to  determine  the  public  key  A  =  [A  |  Ai],  The  blinding  is  needed  so 
that  the  announced  shares  reveal  only  Ai,  and  nothing  more  about  the  honest  parties’  shares  [[R]]’'  themselves. 

Protocol  7tKg  in  the  (Jmind-  JsamPz) -hybrid  model 
Generate:  On  input  (gen,  sid,  A  G  xm,  H*  £  Z”xn,  z),  party  i  does: 

•  Call  Jrsampz(sample,  sid,  in  x  nk,  z,  1)  and  receive  (sample,  sid,  [R]1). 

•  Call  .^Biind (blind,  sid,  —  AJR]1)  and  receive  (blind,  sid,  [AJ'). 

•  Broadcast  [AJ1  and  reconstmct  A  |  =  — AR  from  the  announced  shares. 

•  Output  (gen,  sid,  A  =  [A  |  H*  •  G  +  Ap],  JR]1)- 

Figure  14:  Key  generation  protocol 


A  simulator  5kg  for  demonstrating  the  security  of  7Tkg  is  provided  in  Figure  [15]  Essentially,  security  boils 
down  to  the  fact  that  the  announced  blinded  shares  —  A[R]]*  in  the  protocol  7Tkg  form  a  uniformly  random, 
and  independent  of  the  honest  parties’  outputs  [R]]\  sharing  of  Ai  =  —  AR,  which  is  exactly  what  5kg 
constructs  to  simulate  the  broadcast  messages.  A  full  proof  is  a  straightforward  application  of  this  observation, 
and  of  the  privacy,  robustness,  and  homomorphic  properties  of  the  secret-sharing  scheme,  so  we  omit  it. 

Theorem  A.l.  Protocol  it  kg  statistically  realizes  Tkg  tn  the  (F Blind-  J~  Sample) -hybrid  model  for  t-limited 
environments  in  5°. 
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Simulator  <SKG 

Generate:  Upon  receiving  (gen,  sid,  A  =  [A  |  H*  ■  G  +  Ai],  H*,  z)  from  Jkg: 

•  Generate  a  fresh  sharing  of  Ai  over  Zg. 

•  For  all  currently  corrupted  parties  i,  and  whenever  Z  later  requests  to  corrupt  a  party  i,  receive  the 
share  [R]  '  from  -Fkg-  and  reveal  to  Z  the  following  functionality  outputs  to  party  i: 

-  (sample,  sid ,  [R]1)  on  behalf  of  J-’sampz; 

-  (blind,  sid ,  [AJ1)  on  behalf  of  Jeiind- 

•  Broadcast  [AJ1  on  behalf  of  each  honest  party  i. 

Figure  15:  Simulator  for  7Tkg 


A.3  Realizing  Jborrect 


Recall  that  .^correct  samples  and  distributes  shares  of  a  (non-spherical  Gaussian)  vector  y  from  a  desired  coset 
of  A^(A).  Section  3.2.2  describes  how  to  realize  ^correct  with  trusted  setup,  partly  by  precomputing  shares 
of  samples  from  each  coset  of  A±(gt).  We  next  describe  how  these  shares  can  be  obtained  from  a  utility 
functionality  called  ^Gadget  (Figure  [T6|),  and  how  it  can  easily  be  realized. 


A.3.1  Gadget  Functionality 


The  functionality  J-badget  (Figure  16 1  relates  to  the  special  gadget  vector  g  and  lattice  A  (g/),  as  defined 


3.1 


Recall  that  we  have  a  fixed  public  vector  gf  =  [1,  2, 4, . . . ,  2 


fc-ii 


in  ||MP12||  and  reviewed  in  Section 
Z*xfc  for  k  =  [~ lg  q] ,  which  defines  a  full-rank  lattice  A  L  (g/)  c  Zfc  of  determinant  q  whose  smoothing 
parameter  is  bounded  by  sg  •  ojn,  where  .sg  <  \/5  is  a  known  constant.  The  functionality  generates  shares  of  a 


discrete  Gaussian  over  the  coset  A( 
described  in  IIMP121. 


(g  )  for  any  desired  v  G  Zf;,  by  running  any  of  the  efficient  algorithms 


Functionality  Jbadget 

Sample  coset:  Upon  receiving  (cosetsample,  sid ,  v  £  Zq)  from  at  least  h  honest  parties  in  V: 

•  Sample  z  •<—  DA ±(gt^SfS.u]„  and  generate  a  uniformly  random  sharing  [z]. 

•  Send  (cosetsample,  sid ,  [z]*)  to  each  party  i  in  V,  and  send  (cosetsample,  sid,  v)  to  the  adversary. 

Figure  16:  Functionality  for  operations  related  to  the  gadget  lattice  A±(g<) 


Realizing  ^Gadget  is  straightforward  in  the  (J-sampZi  -^Biind) -hybrid  model,  using  the  homomorphic  proper¬ 
ties  of  secret  sharing:  essentially,  the  parties  request  shares  of  a  Gaussian-distributed  z  G  Zfc  from  TsampZ, 
then  broadcast  blinded  shares  of  the  syndrome  u  =  (g,  z)  mod  q  and  recover  u,  repeating  until  u  =  v.  (The 
blinding  is  needed  so  that  nothing  more  than  the  syndrome  is  revealed  about  z.)  Implemented  naively  as  in 
Figure  [T7J  the  expected  number  of  trials  (which  may  be  performed  in  parallel)  is  almost  exactly  q  =  poly(n), 
because  the  syndrome  u  is  negligibly  far  from  uniform  since  z’s  Gaussian  parameter  is  at  least  the  smoothing 
parameter  of  A^  (g,j.  Alternatively,  shares  of  samples  having  the  wrong  syndrome  can  be  stored  away  and 
used  as  needed  later  on.  Note  that  in  any  case,  ^Gadget  is  only  ever  called  in  the  offline  phase  of  vrconect 
(Figure [T9]),  so  efficiency  is  not  a  top  priority  here. 

A  simulator  ^Gadget  for  demonstrating  the  security  of  our  protocol  is  provided  in  Figure  [18]  For  a  f-limited 
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Protocol  7TGadget  in  the  (JsampZ,  -Fiiind) -hybrid  model 
Sample  coset:  On  input  (cosetsample,  sid,  v  £  Zg),  party  i  does: 

•  Call  J'sampzCsample,  sid ,  k  x  1,  sg,  1)  and  receive  (sample,  sid,  [z]4). 

•  Call  J^iindl blind,  sid,  ( g ,  [z]4)  mod  q )  and  receive  (blind,  sid,  [it]1). 

•  Broadcast  [it]*  and  reconstruct  it  =  (g,  z)  from  the  broadcast  shares. 

•  If  it  =  v,  output  (cosetsample,  sid,  [z]4).  Otherwise,  repeat. 

Figure  17:  Protocol  for  gadget  operations 
Simulator  <SGadget 

Sample  coset:  Upon  receiving  (cosetsample,  sid,  v)  from  Jdadget: 

•  Choose  a  uniformly  random  u  £  Zg,  and  generate  fresh  sharings  of  u  and  of  z  =  0  £  Zq. 

•  For  each  currently  corrupted  party  i,  reveal  to  Z  the  following  functionality  outputs  to  party  i: 

-  (sample,  sid ,  [z]4)  on  behalf  of  J'sampz; 

-  (blind,  sid,  [w]4)  on  behalf  of  J-Biind- 

•  Broadcast  [it]  4  on  behalf  of  all  honest  parties  i. 

•  Unless  u  =  v,  repeat. 

Corruption:  When  Z  requests  to  corrupt  party  i,  for  each  previous  call  to  cosetsample,  reveal  the  corresponding 
messages  (sample,  sid,  [z]4)  and  (blind,  sid,  [u]4)  to  party  i  on  behalf  of  J-sampZ  and  -Feiindi  respectively. 

Figure  18:  Simulator  for  ^Gadget 


adversary,  the  value  of  z  =  0  and  honest  parties’  shares  remain  information  theoretically  hidden.  The 
announced  (blinded)  shares  of  u  in  the  protocol  ^Gadget  form  a  uniformly  random  (and  independent  of  the 
honest  parties’  outputs  [z]4)  sharing  of  the  (nearly)  uniformly  random  syndrome  u  =  (g,  z),  which  is  exactly 
what  ^Gadget  constructs  to  simulate  the  broadcast  messages.  A  full  proof  is  a  straightforward  application  of 
these  observations  and  of  the  privacy,  robustness,  and  homomorphic  properties  of  secret  sharing. 


A.3.2  Protocol  and  Security  Analysis 

The  protocol  7Tc0nect  in  the  (T'viuit-  -^Gadget) -hybrid  model  is  defined  formally  in  Figure [l9]  In  the  initialization 
step,  for  each  j  £  [n]  and  v  £  Z(/  the  parties  populate  each  of  their  local  queues  QjjV  with  at  least  B  entries, 
in  the  following  way:  each  party  i  uses  JrGadget>  its  shares  of  the  trapdoor  R,  and  J-Muit  to  obtain  a  share 
of  y:j,v  =  [^]  (ej  ®  Zj,v)  for  Gaussian-distributed  zJ/i;  £  A  -  (g'  ),  and  places  the  share  in  a  queue  Qj,v- 

(Regarding  the  arguments  to  the  call  to  J-Muit>  note  that 

sharing  polynomial  for  I,  and  cJtV  0  [z:;]4  is  similarly  a  valid  7th  share  of  e;  0  zq.v.)  To  later  answer  a  correct 
request  for  syndrome  v  £  Z” ,  each  party  just  draws  a  share  from  each  of  Q\.V] , . . . ,  Qn,v„  and  sums  these 
shares.  By  the  homomorphic  properties  of  secret  sharing,  this  yields  a  share  of  y  =  Y^je[n]  Y j,vj  =  [  ^  ]  z  for 
Gaussian  z  =  (zi, . . . ,  zn)  £  (G),  as  desired. 


[RF 

i 


is  a  valid  7th  share  of  [  ^  ] ,  via  a  constant 


Theorem  A.2.  Protocol  n  comet  statistically  realizes  T correct  in  the  (TmuIi-  F Gadget) -hybrid  model  for  t-limited 
environments  in  6Z‘ . 
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Protocol  7TCorrect  in  the  ( JMuit,  Fbadget) -hybrid  model 
Initialize:  On  input  (init,  sid,  [R]4,  B),  party  i  does: 

•  Locally  store  (sid,  [R]4)  and  initialize  local  queues  Qj,v  for  each  j  £  [n]  and  v  £  Zq. 

•  For  each  j  £  [n],  while  there  exists  some  v  £  Zq  such  that  Qj.v  has  fewer  than  B  entries: 

-  Call  J'cadgetCcosetsample,  sid ,  v)  and  receive  (cosetsample,  sid ,  [zjj4). 

j  <S>  and  receive  (mult,  sid,  [yj>]4),  where  y3^v  = 


[R]' 

i 


-  Call  FMuit(mult,  sid, 

-  Place  [yy,t,]4  in  local  queue  Qh, 
•  Output  (init,  sid). 


Correct:  On  input  (correct,  sid,  v),  if  fewer  than  B  calls  to  correct  have  already  been  made,  party  i  does: 

•  For  each  j  £  [n],  dequeue  an  entry  [yj,„]4  from  Qj.v  . 

•  Locally  compute  [y]i  =  £ie[n]  [y^F- 

•  Output  (correct,  sid,  [y]4). 


Figure  19:  Syndrome  correction  protocol 


Simulator  <SCoirect 

Initialize:  Upon  receiving  (init,  sid ,  B)  from  -Ccon-ecd 

•  Initialize  empty  lists  Qj  v  for  each  j  £  [n]  and  v  £  Zq. 

•  For  each  j  £  [n],  while  there  exists  some  v  £  Zq  such  that  Q  j.v  has  fewer  than  B  unused  entries: 

-  Generate  a  fresh  sharing  of  z j  V  =  0  £  Zkq,  and  send  (cosetsample,  sid,  [zj  t,]4)  on  behalf 
of  .^Gadget  to  each  currently  corrupted  party  i  in  V . 

-  Generate  a  fresh  sharing  [y^u]  for  y3.v  =  0  £  Z™,  and  send  (mult,  sid,  [yj,!,]4)  on  behalf  of 
d-Muit  to  each  currently  corrupted  i  in  V. 

-  Store  (([zjjW]  ,  lyyvj)  as  an  unused  entry  at  the  end  of  list  Qj.v. 

Correct:  Upon  receiving  (correct,  sid,  v)  from  -Foon-ecd 

•  Foreachj  £  [n],  look  up  the  next  unused  entry  (|[zj|WJ,  [yj>  ])  from  QjtV-,  and  mark  it  as  used  for 

this  call  to  correct.  For  each  currently  corrupted  party  i  in  V,  send  [y]4  =  X^je[n]  to  -^correct 

as  the  desired  share  for  party  i. 

Corruption:  When  Z  requests  to  corrupt  party  i, 

•  Receive  party  z’ s  share  [y]4  for  each  previous  call  of  the  form  (correct,  sid:,  vj.  Look  up  the  n 
corresponding  (used)  entries  (Jz^J,  [y?!,;J)  in  QhVj ,  and  update  the  value  [yn,„„]4  so  that  [y]4  = 

Sje[ra]  [yz.’UjF- 

•  For  all  entries  ([z^  „],  [yyt,]),  both  used  and  unused,  in  each  list  (Qt.v,  reveal  to  Z  the  messages 
(cosetsample,  sid,  [z^  t,]4)  and  (mult,  sid,  [yj,tl]4)  to  party  i  on  behalf  of  ^Gadget  and  FmuIu  respec¬ 
tively. 

Figure  20:  Simulator  for  7rconect 


Proof  sketch.  A  simulator  ^correct  for  demonstrating  the  security  of  7rcon-ect  is  provided  in  Figure  [20]  The 
only  subtlety  lies  in  the  fact  that  outputs  from  helper  functionalities  during  precomputation  must  be  simulated 
before  knowing  which  parties  will  be  corrupted  when  the  corresponding  correct  calls  are  made  later  on.  As 
mentioned  in  Section [3. 2. 2[  this  is  why  we  designed  Fcorrect  to  ask  the  adversary  for  shares  for  the  corrupted 
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parties:  the  simulator  generates  its  own  shares  when  simulating  the  precomputation,  and  provides  them  to  the 
functionality  upon  request.  Then  security  boils  down  to  the  fact  that  even  though  yj,Vj  =  { r/ ' ]  (e?  0  z j,Vj), 
all  shares  of  zhVj  (that  the  adversary  sees)  are  uniform  and  independent  of  the  corresponding  shares  of  y jiVj 
since  J-Muit  blinds  its  output,  so  the  queuing  strategy  employed  by  ^correct  indeed  produces  shares  of  y  with 
the  desired  distribution.  □ 


A.4  Realizing  Jpc-mri. 

Recall  that  ^perturb  (Figure  [3])  distributed  shares  of  of  perturbations  p  drawn  from  the  discrete  Gaussian 
distribution  D^m  where  the  covariance  Ep  depends  on  the  trapdoor  R.  In  the  standalone  setting,  this  is 

straightforward:  first  generate  a  continuous  Gaussian  p'  E  Mm  with  covariance  Ep  —  I  •  then  randomly 
round  each  coordinate  of  p'  to  a  nearby  integer  (see  UPeilOl  for  details).  In  the  threshold  setting,  generating  a 
good  perturbation  seems  quite  a  bit  more  difficult,  because  neither  p  nor  its  covariance  £p  can  be  revealed, 
since  they  leak  the  trapdoor  R.  Fortunately,  we  can  give  a  distributed  protocol  that  emulates  the  standalone 
rounding  procedure  up  to  sufficient  precision. 


A.4.1  Extending  J^sampZ 


We  first  extend  the  functionality  J^ampZ  (originally  defined  in  Section  A.  1 1  with  two  additional  commands, 
cosetsample  and  rround,  which  support  randomized  rounding  of  a  shared  value  x  E  q~3 :Z  to  the  integers  Z. 
Note  that  while  cosetsample  and  rround  are  defined  for  (scalings  of)  the  integer  lattice  Z,  the  commands 
immediately  generalize  to  vectors  and  matrices,  component-wise.  (This  is  simply  because  spherical  Gaussians 
over  cosets  of  Zhxw  are  just  product  distributions  of  Gaussians  over  cosets  of  Z.)  This  extended  functionality 
is  defined  formally  in  Figure  [21] 


Functionality  J'sampz 

Sample:  Upon  receiving  (sample,  sid,  h  x  w,z,  d )  from  at  least  h  honest  parties  in  V : 

•  Sample  X  -e-  and  generate  a  fresh  sharing  |X]  over  Zgd. 

•  Send  (sample,  sid ,  [X]1)  to  each  party  i  in  V  and  (sample,  sid ,  h  x  w,z,  d)  to  the  adversary. 

Sample  coset:  Upon  receiving  (cosetsample,  sid ,  v  E  g_-7'Z/q_-7+1Z,  z  >  q~^+1)  from  at  least  h  honest  parties 
in  V: 

•  Sample  x  «—  Dq-i+iz+v,z-un  and  let  c  =  x  mod  Z. 

•  Generate  a  fresh  sharing  [i]  over  g-JZ/gZ. 

•  Send  (cosetsample,  sid ,  [a"]*,  c)  to  each  party  i  in  V,  and  (cosetsample,  sid ,  c,  z)  to  the  adversary. 
Randomized  round:  Upon  receiving  (rround,  sid ,  [a;]1  €  q~dZ/qZ)  from  at  least  h  honest  parties  in  V: 

•  Reconstruct  x  and  let  c  =  x  mod  Z.  Sample  an  integer  z  <r-  x  + 

•  Generate  a  fresh  sharing  [2]  over  Zg,  and  send  (rround,  sid,  [0]*,  c)  to  each  party  i  in  V ,  and 
(rround,  sid ,  c)  to  the  adversary. 

Figure  2 1 :  Full  integer  sampling  functionality  (which  replaces  Figure  [13]) 


A  protocol  vrsampZ  realizing  J’sampZ  in  the  J-RHna -hybrid  model  is  given  in  Figure  [22]  We  elaborate 
informally  on  the  implementation  of  the  two  additional  commands,  noting  that  implementation  of  the  sample 
command  was  discussed  in  Section  A.  1  We  omit  a  formal  security  proof  for  7Tsampz,  but  we  remark  that  while 
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the  protocol  is  somewhat  more  complicated  than  the  key-generation  protocol  of  Section  A. 2 
based  straightforwardly  on  the  same  main  observations,  plus  Lemma [23] 


its  security  is 


Coset  sampling.  The  cosetsample  command  (which  exists  mainly  to  support  the  randomized-rounding 
command,  described  next)  generates  a  discrete  Gaussian  variable  x  with  given  parameter  z  >  q~^+1  over 
the  (possibly  very  dense)  lattice  q~J :Z,  such  that  x’s  least  significant  basc-cy  digit  is  a  specified  v.  It  then 
distributes  shares  [[xj*  to  the  respective  parties,  along  with  the  value  c  =  x  mod  Z,  which  also  goes  to  the 
adversary. 

Naively  implementing  cosetsample  in  our  protocol  is  very  simple:  the  parties  just  use  sample  to  generate 
a  Gaussian  value  q^x  G  Z  with  parameter  qJ  z  ■  c on,  shared  over  Z?j+i,  then  reveal  their  blinded  shares 
x  mod  Z  to  reconstruct  c,  repeating  until  the  least-significant  digit  is  v.  Because  2  >  q~i+1  is  large  enough, 
x’s  least  significant  digit  is  nearly  uniform  by  Corollary  |2. 3 [  and  the  expected  number  of  trials  is  almost 
exactly  q  =  poly(n).  Of  course,  this  procedure  throws  away  many  samples;  a  more  efficient  implementation 
would  precompute  many  trials  and  store  the  results  according  to  least-significant  digit,  so  that  the  online  phase 
of  the  command  becomes  just  a  noninteractive  table  lookup.  For  simplicity,  we  formally  define  only  the  naive 
implementation. 

Randomized  rounding.  The  rround  command  takes  shares  of  a  value  x  <£  q~d Z  (represented  modulo  qE) 
and  rounds  it  to  an  integer  z  G  Z  using  Gaussian  rounding.  It  returns  shares  to  the  respective  parties, 
along  with  the  coset  c  =  x  mod  Z  of  the  original  input,  which  also  goes  to  the  adversary.  In  our  protocol, 
the  parties  broadcast  their  blinded  shares  of  c  =  x  mod  Z  to  reconstruct  c,  then  use  d  calls  to  cosetsample  to 
round  x  one  digit  at  a  time,  from  least-  to  most-significant  digit.  Note  that  each  call  to  cosetsample  alters  the 
more-significant  digits  of  x  mod  Z,  but  these  changes  are  public. 


A.4.2  Protocol  and  Security  Analysis 


In  brief,  our  perturbation  protocol  starts  by  generating  a  sharing  of  a  sufficiently  high-precision  approximation 
P  ~  -y/Zp  with  some  d  digits  of  precision  in  its  fractional  part  (i.e.,  the  entries  of  P  are  in  q  ~dZ).  The 
sharing  of  P  can  be  precomputed  as  part  of  the  key-generation  phase,  using  general  multiparty  computation. 
To  generate  a  perturbation  vector  p,  the  protocol  first  generates  a  sharing  of  a  high-precision  Gaussian  random 
variable  p'  G  q~d Zm  having  covariance  PPf  ~  Sp.  It  does  this  by  invoking  J-sampZ  to  generate  shares  of  a 
Gaussian-distributed  z  £  Zm,  and  then  invoking  J-Muit  to  get  a  fresh  sharing  of  p'  =  Pz.  The  parties  then 
randomize -round  their  shared  p'  e  q  dEr"  to  a  shared  final  perturbation  vector  p  £  Zm,  using  the  rround 
command  of  J-sampZ-  (Recall  from  A.4.1  that  this  command  reveals  c  =  p'  mod  Zm  publicly.)  Finally,  using 


the  secret-sharing  homomorphisms  the  parties  also  reconstruct  the  two  syndromes  w  and  w.  (Recall  that 
these  are  eventually  needed  by  the  full  Gaussian  sampling  protocol  7Tgs-)  Note  that  once  the  sharing  of  P  is 
computed  once  and  for  all,  the  only  trapdoor-dependent  work  is  the  relatively  efficient  call  to  J-Muit- 

For  analyzing  security,  the  essence  of  the  argument  is  that  the  public  residue  c  =  p7  mod  Zm  returned  by 
J^ampy. .rround  is  (nearly)  uniformly  random,  and  hence  simulatable  without  knowing  p'.  because  p'  is  drawn 
from  a  Gaussian  whose  parameter  exceeds  the  smoothing  parameter  of  Zm.  All  the  remaining  functionalities 
simply  return  independent  and  properly  blinded  shares  of  intermediate  values  to  then-  respective  owners,  and 
so  are  trivial  to  simulate.  Therefore,  we  omit  a  formal  simulator  and  security  proof  for  7Tperturb?  which  are 
tedious  (though  straightforward  given  the  above  intuition). 


27 


Approved  for  Public  Release;  Distribution  Unlimited. 
294 


Protocol  7rsampZ  in  the  ^-Blind-hybrid  model 
Sample:  On  input  (sample,  sid,  h  x  w,  z,  d),  party  i  does: 

•  With  the  other  parties,  run  an  inverse  sampling  protocol  (see|A.  l|for  elaboration)  to  generate  private 
output  [X]\  where  X  <-  D%*Zn. 

•  Output  (sample,  sid ,  [X]1). 

Sample  coset:  On  input  (cosetsample,  sid ,  v  £  q~:iZ/q~:i+1Z,  z  >  g_J+1),  party  i  does: 


•  Call  (sample,  sid ,  lxl,  q^z,  j  +  1)  and  receive  (sample,  sid ,  g-7  |cc]z)  with  x  £  q~J 'Z. 

•  Call  ^-Biind(blind,  sid,  [a:]1  mod  Z)  and  receive  (blind,  sid,  [c]®). 

•  Announce  |c] *  and  reconstruct  c  from  other  parties’  announced  shares. 

•  If  c  mod  q~^+1  =  v,  output  (cosetsample,  sid,  [a;]1).  Otherwise,  repeat. 

Randomized  round:  On  input  (rround,  sid,  [a;]1  £  q~dZ/qZ),  party  i  does: 


•  Call  JrBiind( blind,  sid,  [a;]*  mod  Z)  and  receive  a  fresh  share  [c]1  of  c  =  x  mod  Z.  Broadcast  [c]\ 

•  Reconstruct  c  £  q~dZ)Z  from  the  announced  shares. 


•  Let  v  =  c  and  [z]1  =  [a:]1.  For  j  =  d, . . . ,  1: 

-  Call  (cosetsample,  sid,  —v  mod  q~^+1Z,  V d)  as  a  subroutine,  and  receive  back 
(cosetsample,  sid,  [a/]*  £  q~^Z/qZ,c'  £  g_JZ/Z). 

-  Let  v  t—  v  +  d  £  g_l+1Z/Z  and  [z]*  «—  [z]®  +  [a/]*  £  q-^Z/qZ,  which  is  an  ith  share  of 
z  +  x'  £  q~j+1Z/qZ. 

-  Truncate  |z|*  to  lie  in  q~i+1Z/ qZ  (without  changing  the  underlying  shared  value)  as  described 


in  Section 


2.3 


i.e.,  let  [zf  <-  \zf  -  ([zf  mod  q~j+1). 


Output  (rround,  sid,  [z]®,  c). 


Figure  22:  Integer  sampling  protocol 
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Protocol  7Tperturb  in  the  (Tijiind,  ^Muit,  -^Sampz) -hybrid  model 

In  what  follows,  define  the  quotient  ring  Gd  =  q~dZ/qZ. 

Initialize:  On  input  (init,  sid,  A_h*  ,  [R]*,  s,  B),  party  i  does: 

•  With  the  other  parties,  run  a  statistically  secure  multiparty  computation  protocol  to  compute  (as  a 
private  output)  [P]4  shared  over  Gd,  where  Ep  =  s2  —  s2  [^]  [rtf  i],  and  P  «  y/Ep  —  d2  ■  u> 2 

•  Locally  store  sid,  A  H«,  and  [P]*,  and  initialize  a  local  queue  (). 

•  While  Q  has  fewer  than  B  entries: 

-  Call  J~ Sampz (sample,  sid ,  m  x  1, 1,  d  +  1)  and  receive  (sample,  sid,  [z]®)  for  some  z  •<— 
that  is  shared  over  Zqd+ 1 . 

-  Call  JrMuit(mult,  sid,  qd  ■  [P]®,  |z]*)  and  receive  (mult,  sid,  qd  ■  [p']!)  where  p'  =  Pz  G  G™. 
(Above  we  are  multiplying  and  dividing  shares  by  qd  simply  to  compute  an  isomorphism  between 
Gd  and  Zqd+i,  because  J^uit  expects  to  receive  and  return  shares  over  the  latter  ring.) 

-  Call  JrSampz(rround,  sid,  [p']®)  and  receive  (rround,  sid,  [p]4  G  Z£®,  c  =  p'  mod  Zm),  where 
p  has  distribution  p  +  i?zm_p',d-wn  G  Zm. 

-  Call  JBiind(blind,  sid,  [0  |  G]  •  [p]4)  and  ^BtmdC blind,  sid ,  A_H*  •  [p]4)  and  receive  back  [w]4 
and  [w]\  respectively.  Broadcast  these  shares. 

-  Reconstruct  w  and  w  from  the  announced  shares,  and  put  ([p]\  w,  w)  in  local  queue  Q. 

•  Output  (init,  sid). 

Perturb:  On  input  (perturb,  sid),  if  fewer  than  B  calls  to  perturb  have  already  been  made,  party  i  does: 

•  Dequeue  (|p]  \  w,  w)  from  local  queue  Q. 

•  Output  (perturb,  sid,  [p]4,w,w). 

Figure  23:  Perturbation  protocol 
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Abstract 

Gentry’s  “bootstrapping”  technique  (STOC  2009)  constructs  a  fully  homomorphic  encryption  (FHE) 
scheme  from  a  “somewhat  homomorphic”  one  that  is  powerful  enough  to  evaluate  its  own  decryption 
function.  To  date,  it  remains  the  only  known  way  of  obtaining  unbounded  FHE.  Unfortunately,  bootstrap¬ 
ping  is  computationally  very  expensive,  despite  the  great  deal  of  effort  that  has  been  spent  on  improving 
its  efficiency.  The  current  state  of  the  art,  due  to  Gentry,  Halevi,  and  Smart  (PKC  2012),  is  able  to 
bootstrap  “packed”  ciphertexts  (which  encrypt  up  to  a  linear  number  of  bits)  in  time  only  quasilinear 
0(A)  =  A  •  log0*1'  A  in  the  security  parameter.  While  this  performance  is  asymptotically  optimal  up  to 
logarithmic  factors,  the  practical  import  is  less  clear:  the  procedure  composes  multiple  layers  of  expensive 
and  complex  operations,  to  the  point  where  it  appears  very  difficult  to  implement,  and  its  concrete  runtime 
appears  worse  than  those  of  prior  methods  (all  of  which  have  quadratic  or  larger  asymptotic  runtimes). 

In  this  work  we  give  simple ,  practical ,  and  entirely  algebraic  algorithms  for  bootstrapping  in  quasilin¬ 
ear  time,  for  both  “packed”  and  “non-packed”  ciphertexts.  Our  methods  are  easy  to  implement  (especially 
in  the  non-packed  case),  and  we  believe  that  they  will  be  substantially  more  efficient  in  practice  than 
all  prior  realizations  of  bootstrapping.  One  of  our  main  techniques  is  a  substantial  enhancement  of  the 
“ring-switching”  procedure  of  Gentry  et  al.  (SCN  2012),  which  we  extend  to  support  switching  between 
two  rings  where  neither  is  a  subring  of  the  other.  Using  this  procedure,  we  give  a  natural  method  for 
homomorphically  evaluating  a  broad  class  of  structured  linear  transformations,  including  one  that  lets  us 
evaluate  the  decryption  function  efficiently. 
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1  Introduction 


Bootstrapping,  a  central  technique  from  the  breakthrough  work  of  Gentry  liGen09b[  lGen09al  on  fully  homo¬ 
morphic  encryption  (FHE),  converts  a  sufficiently  powerful  “somewhat  homomorphic”  encryption  (SHE) 
scheme  into  a  fully  homomorphic  one.  (An  SHE  scheme  can  support  a  bounded  number  of  homomorphic 
operations  on  freshly  generated  ciphertexts,  whereas  an  FHE  scheme  has  no  such  bound.)  In  short,  boot¬ 
strapping  works  by  homomorphic  ally  evaluating  the  SHE  scheme’s  decryption  function  on  a  ciphertext  that 
cannot  support  any  further  homomorphic  operations.  This  has  the  effect  of  “refreshing”  the  ciphertext,  i.e.,  it 
produces  a  new  one  that  encrypts  the  same  message  and  can  handle  more  homomorphic  operations.  Boot¬ 
strapping  remains  the  only  known  way  to  achieve  unbounded  FHE,  i.e.,  a  scheme  that  can  homomorphically 
evaluate  any  efficient  function  using  keys  and  ciphertexts  of  a  fixed  size[j] 

In  order  to  be  “bootstrappable ,”  an  SHE  scheme  must  be  powerful  enough  to  homomorphically  evaluate 
its  own  decryption  function,  using  whatever  homomoiphic  operations  it  supports.  For  security  reasons,  the 
key  and  ciphertext  sizes  of  all  known  SHE  schemes  grow  with  the  depth  and,  to  a  lesser  extent,  the  size  of 
the  functions  that  they  can  homomorphically  evaluate.  For  instance,  under  plausible  hardness  conjectures, 
the  key  and  ciphertext  sizes  of  the  most  efficient  SHE  scheme  to  date  IIBGV1 2ll  grow  quasilinearly  in  both 
the  supported  multiplicative  depth  d  and  the  security  parameter  A,  i.e.,  as  0(d  ■  A).  Clearly,  the  runtime  of 
bootstrapping  must  also  grow  with  the  sizes  of  the  keys,  ciphertexts,  and  decryption  function.  This  runtime  is 
perhaps  the  most  important  measure  of  efficiency  for  FHE,  because  bootstrapping  is  currently  the  biggest 
bottleneck  by  far  in  instantiations,  both  in  theory  and  in  practice. 

The  past  few  years  have  seen  an  intensive  study  of  different  forms  of  decryption  procedures  for  SHE 
schemes,  and  their  associated  bootstrapping  operations  [Gen09b,  Gen09al  IvDGHV fOl  IGH1  lbl  IBV 1  lal 
Gill  la.  BGV12.  GllS12h|.  The  first  few  bootstrapping  methods  had  moderate  polynomial  runtimes  in  the 
security  parameter  A,  e.g.,  0(A4).  Brakerski,  Gentry,  and  Vaikuntanathan  1BGV121  gave  a  major  efficiency 
improvement,  reducing  the  runtime  to  0(X2).  They  also  gave  an  amortized  method  that  bootstraps  Q(A) 
ciphertexts  at  once  in  0(A2)  time,  i.e.,  quasilinear  runtime  per  ciphertext.  However,  these  results  apply  only 
to  “non-packed”  ciphertexts,  i.e.,  ones  that  encrypt  essentially  just  one  bit  each,  which  combined  with  the 
somewhat  large  runtimes  makes  these  methods  too  inefficient  to  be  used  very  much  in  practice.  Most  recently, 
Gentry,  Halevi,  and  Smart  llGHS12al  achieved  bootstrapping  for  “packed”  ciphertexts  (i.e.,  ones  that  encrypt 
up  to  H(A)  bits  each)  in  quasilinear  0(A)  runtime,  which  is  asymptotically  optimal  in  space  and  time,  up  to 
polylogarithmic  factors.  For  this  they  relied  on  a  general  “compiler”  from  another  work  of  theirs  IlGHS  1 2bH . 
which  achieved  SHE/FHE  for  sufficiently  wide  circuits  with  polylogarithmic  multiplicative  “overhead,”  i.e., 
cost  relative  to  evaluating  the  circuit  “in  the  clear.” 

Bootstrapping  and  FHE  in  quasi-optimal  time  and  space  is  a  very  attractive  and  powerful  theoretical  result. 
However,  the  authors  of  ffGHS  1 2b  1  IGHS12al  caution  that  their  constructions  may  have  limited  potential  for 
use  in  practice,  for  two  main  reasons:  first,  the  runtimes,  while  asymptotically  quasilinear,  include  very  large 
polylogarithmic  factors.  For  realistic  values  of  the  security  parameter,  these  polylogarithmic  terms  exceed 
the  rather  small  (but  asymptotically  worse)  quasilinear  overhead  obtained  in  llBGVT2ll.  The  second  reason 
is  that  their  bootstrapping  operation  is  algorithmically  very  complex  and  difficult  to  implement  (see  the 
next  paragraphs  for  details).  Indeed,  while  there  are  now  a  few  working  implementations  of  bootstrapping 
(e.g.,  HGH1 1bllCCK+13ll)  that  follow  the  templates  from  HGen09bl  IGen09al  IvDGHV lOl  IBGV  12ll .  we  are 
not  aware  of  any  attempt  to  implement  any  method  having  subquadratic  runtime. 

'This  stands  in  contrast  with  leveled  FHE  schemes,  which  can  homomorphically  evaluate  a  function  of  any  a  priori  bounded 
depth,  but  using  keys  and  ciphertexts  whose  sizes  depend  on  the  bound.  Leveled  FHE  can  be  constructed  without  resorting  to 
bootstrapping  IBGV  1 21. 
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Is  quasilinear  efficient?  The  complexity  and  large  practical  overhead  of  the  constructions  in  IIGHS12b. 
IGHS12a1  arise  from  two  kinds  of  operations.  First,  the  main  technique  from  lGHS12bl  is  a  way  of  homomor- 
phically  evaluating  any  sufficiently  shallow  and  wide  arithmetic  circuit  on  a  “packed”  ciphertext  that  encrypts 
a  high-dimensional  vector  of  plaintexts  in  multiple  “slots.”  It  works  by  first  using  ring  automorphisms 
and  key-switching  operations  I  BV 1  1  ai  1BGV121  to  obtain  a  small,  fixed  set  of  “primitive”  homomorphic 
permutations  on  the  slots.  It  then  composes  those  permutations  (along  with  other  homomorphic  operations) 
in  a  log-depth  permutation  network,  to  obtain  any  permutation.  Finally,  it  homomorphically  evaluates  the 
desired  circuit  by  combining  appropriate  permutations  with  relatively  simple  homomorphic  slot-selection  and 
ring  operations. 

In  the  context  of  bootstrapping,  one  of  the  key  observations  from  IGHS12al  is  that  a  main  step  of  the 
decryption  procedure  can  be  evaluated  using  the  above  technique.  Specifically,  they  need  an  operation 
that  moves  the  coefficients  of  an  encrypted  plaintext  polynomial,  reduced  modulo  a  cyclotomic  polynomial 
$>m(X),  into  the  slots  of  a  packed  ciphertext  (and  back  again).  Once  the  coefficients  are  in  the  slots,  they 
can  be  rounded  in  a  batched  (SIMD)  fashion,  and  then  mapped  back  to  coefficients  of  the  plaintext.  The 
operations  that  move  the  coefficients  into  slots  and  vice-versa  can  be  expressed  as  0( log  A) -depth  arithmetic 
circuits  of  size  0( A  log  A),  roughly  akin  to  the  classic  FFT  butterfly  network.  Hence  they  can  be  evaluated 
homomorphically  with  polylogarithmic  overhead,  using  llGHS  12bll .  However,  as  the  authors  of  lGHS12al 
point  out,  the  decryption  circuit  is  quite  large  and  complex  -  especially  the  part  that  moves  the  slots  back  to 
the  coefficients,  because  it  involves  reduction  modulo  $m(X)  for  an  m  having  several  prime  divisors.  This 
modular  reduction  is  the  most  expensive  part  of  the  decryption  circuit,  and  avoiding  it  is  one  of  the  main 
open  problems  given  in  lGHS12al.  However,  even  a  very  efficient  decryption  circuit  would  still  incur  the 
large  polylogarithmic  overhead  factors  from  the  techniques  of  llGHS12bll. 

1.1  Our  Contributions 

We  give  a  new  bootstrapping  algorithm  that  runs  in  quasilinear  0(A)  time  per  ciphertext  with  small  poly¬ 
logarithmic  factors,  and  is  algorithmically  much  simpler  than  previous  methods.  It  is  easy  to  implement, 
and  we  believe  that  it  will  be  substantially  more  efficient  in  practice  than  all  prior  methods.  We  provide 
a  unified  bootstrapping  procedure  that  works  for  both  “non-packed”  ciphertexts  (which  encrypt  integers 
modulo  some  p,  e.g.,  bits)  and  “packed”  ciphertexts  (which  encrypt  elements  of  a  high-dimensional  ring),  and 
also  interpolates  between  the  two  cases  to  handle  an  intermediate  concept  we  call  “semi-packed”  ciphertexts. 

Our  procedure  for  non-packed  ciphertexts  is  especially  simple  and  efficient.  In  particular,  it  can  work  very 
naturally  using  only  cyclotomic  rings  having  power-of-two  index,  i.e.,  rings  of  the  form  Z[X]/ (1  +  X1  ), 
which  admit  very  fast  implementations.  This  improves  upon  the  method  of  I BGV 121.  which  achieves 
quasilinear  amortized  runtime  when  bootstrapping  0(A)  non-packed  ciphertexts  at  once.  Also,  while  that 
method  can  also  use  power-of-two  cyclotomics,  it  can  only  do  so  by  emulating  Z2  (bit)  arithmetic  within  Zp 
for  some  moderately  large  prime  p,  which  translates  additions  in  Z2  into  much  more  costly  multiplications  in 
Zp.  By  contrast,  our  method  works  “natively”  with  any  plaintext  modulus. 

For  packed  ciphertexts,  our  procedure  draws  upon  high-level  ideas  from  [GHS  12bl  GHS  12all .  but  our 
approach  is  conceptually  and  technically  very  different.  Most  importantly,  it  completely  avoids  the  two  main 
inefficiencies  from  those  works:  first,  unlike  IG1  IS  1 2hl.  we  do  not  use  permutation  networks  or  any  explicit 
permutations  of  the  plaintext  slots,  nor  do  we  rely  on  a  general-purpose  compiler  for  homomorphically 
evaluating  arithmetic  circuits.  Instead,  we  give  direct,  practically  efficient  procedures  for  homomorphically 
mapping  the  coefficients  of  an  encrypted  plaintext  element  into  slots  and  vice-versa.  In  particular,  our 
procedure  does  not  incur  the  large  cost  or  algorithmic  complexity  of  homomorphically  reducing  modulo 
<I>m(X),  which  was  the  main  bottleneck  in  the  decryption  circuit  of  llGHS  1 2al. 

2 


Approved  for  Public  Release;  Distribution  Unlimited. 
299 


At  a  higher  level,  our  bootstrapping  method  has  two  other  attractive  and  novel  features:  first,  it  is  entirely 
“algebraic,”  by  which  we  mean  that  the  full  procedure  (including  generation  of  all  auxiliary  data  it  uses) 
can  be  described  as  a  short  sequence  of  elementary  operations  from  the  “native  instruction  set”  of  the  SHE 
scheme.  By  contrast,  all  previous  methods  at  some  point  invoke  rather  generic  arithmetic  circuits,  e.g.,  for 
modular  addition  of  values  represented  as  bit  strings,  or  reduction  modulo  a  cyclotomic  polynomial  3>m(X). 
Of  course,  arithmetic  circuits  can  be  evaluated  using  the  SHE  scheme’s  native  operations,  but  we  believe 
that  the  distinction  between  “algebraic”  and  “non-algebraic”  is  an  important  qualitative  one,  and  it  certainly 
affects  the  simplicity  and  concrete  efficiency  of  the  bootstrapping  procedure. 

The  second  nice  feature  of  our  method  is  that  it  completely  decouples  the  algebraic  structure  of  the 
SHE  plaintext  ring  from  that  which  is  needed  by  the  bootstrapping  procedure.  In  previous  methods  that  use 
amortization  (or  “batching”)  for  efficiency  (e.g.,  ISVTlliBGVT21IGHS12all').  the  ring  and  plaintext  modulus 
of  the  SHE  scheme  must  be  chosen  so  as  to  provide  many  plaintext  slots.  However,  this  structure  may  not 
always  be  a  natural  match  for  the  SHE  application’s  efficiency  or  functionality  requirements.  For  example,  the 
lattice-based  pseudorandom  function  of  HBPR12II  works  very  well  with  a  ring  Rq  =  Zf/  [X ] / (Xn  +  1)  where 
both  q  and  n  are  powers  of  two,  but  for  such  parameters  Rq  has  only  one  slot.  Our  method  can  bootstrap  even 
for  this  kind  of  plaintext  ring  (and  many  others),  while  still  using  batching  to  achieve  quasilinear  runtime. 

1.2  Techniques 

At  the  heart  of  our  bootstrapping  procedure  are  two  novel  homomorphic  operations  for  SHE  schemes  over 
cyclotomic  rings:  for  non-packed  (or  semi-packed)  ciphertexts,  we  give  an  operation  that  isolates  the  message¬ 
carrying  coefficient s)  of  a  high-dimensional  ring  element;  and  for  (semi-)packed  ciphertexts,  we  give  an 
operation  that  maps  coefficients  to  slots  and  vice-versa. 

Isolating  coefficients.  Our  first  homomotphic  operation  is  most  easily  explained  in  the  context  of  non- 
packed  ciphertexts,  which  encrypt  single  elements  of  the  quotient  ring  Zp  for  some  small  modulus  p,  using 
ciphertexts  over  some  cyclotomic  quotient  ring  Rq  =  R/qR  of  moderately  large  degree  d  =  deg(f?/Z)  = 
0(A).  We  first  observe  that  a  ciphertext  to  be  bootstrapped  can  be  reinterpreted  as  an  encryption  of  an 
iig-element,  one  of  whose  Z9-coefficients  (with  respect  to  an  appropriate  basis  of  the  ring)  “noisily”  encodes 
the  message,  and  whose  other  coefficients  are  just  meaningless  noise  terms.  We  give  an  simple  and  efficient 
homomorphic  operation  that  preserves  the  meaningful  coefficient,  and  maps  all  the  others  to  zero.  Having 
isolated  the  message-encoding  coefficient,  we  can  then  homomorphic  ally  apply  an  efficient  integer  “rounding” 
function  (see  lGHS12all  and  Appendix [B])  to  recover  the  message  from  its  noisy  encoding,  which  completes 
the  bootstrapping  procedure.  (Note  that  it  is  necessary  to  remove  the  meaningless  noise  coefficients  first, 
otherwise  they  would  interfere  with  the  correct  operation  of  the  rounding  function.) 

Our  coefficient-isolating  procedure  works  essentially  by  applying  the  trace  function  TrR/z:  II  H>  Z 
to  the  plaintext.  The  trace  is  the  “canonical”  Z-linear  function  from  R  to  Z,  and  it  turns  out  that  for  the 
appropriate  choice  of  Z-basis  of  R  used  in  decryption,  the  trace  simply  outputs  (up  to  some  scaling  factor) 
the  message-carrying  coefficient  we  wish  to  isolate.  One  simple  and  very  efficient  way  of  applying  the  trace 
homomorphically  is  to  use  the  “ring-switching”  technique  of  BGHPS121.  but  unfortunately,  this  requires  the 
ring-LWE  problem  HLPR101  to  be  hard  over  the  target  ring  Z,  which  is  clearly  not  the  case.  Another  way 
follows  from  the  fact  that  Tr^/Z  equals  the  sum  of  all  d  automorphisms  of  R\  therefore,  it  can  be  computed 
by  homomorphically  applying  each  automorphism  and  summing  the  results.  Unfortunately,  this  method  takes 
at  least  quadratic  fl(A2)  time,  because  applying  each  automorphism  homomorphically  takes  <2 (A)  time,  and 
there  are  d  =  H(A)  automorphisms. 
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So,  instead  of  inefficiently  computing  the  trace  by  summing  all  the  automorphisms  at  once,  we  consider 
a  tower  of  cyclotomic  rings  Z  =  R C  R^  C  ■  ■  ■  C  R =  R,  usually  written  as  R ^  /  ■  ■  ■  /i?W /R(°\ 
Then  Tr^/Z  is  the  composition  of  the  individual  trace  functions  Tr^p /R(i-i) :  R^l~l\  and  these 

traces  are  equal  to  the  sums  of  all  automorphisms  of  R^  that  fix  R(l~  ^  pointwise,  of  which  there  are  exactly 
di  =  deg (R^  /  R^l~1'>)  =  deg(i?W/Z)/ deg(R^~1'1 /Z).  We  can  therefore  compute  each  TtrM/kP-i) 
in  time  linear  in  A  and  in  d,t ;  moreover,  the  number  of  trace  functions  to  apply  is  at  most  logarithmic  in 
d  =  deg(i?/Z)  =  O(A),  because  each  one  reduces  the  degree  by  a  factor  of  at  least  two.  Therefore,  by 
ensuring  that  the  degrees  of  R^r\  Rtr^v'> , . . . ,  R'{]}  decrease  gradually  enough,  we  can  homomorphic  ally 
apply  the  full  Ttr/z  in  quasilinear  time.  For  example,  a  particularly  convenient  choice  is  to  let  Ri/!)  be 
the  2'+ 1  st  cyclotomic  ring  Z[X]/(1  +  X'2>)  of  degree  2*,  so  that  every  di  =  2,  and  there  are  exactly 
log2(d)  =  0(log  A)  trace  functions  to  apply. 

More  generally,  when  bootstrapping  a  semi-packed  ciphertext  we  start  with  a  plaintext  value  in  Rq  that 
noisily  encodes  a  message  in  Sp,  for  some  subring  SCR.  (The  case  S  =  Z  corresponds  to  a  non-packed 
ciphertext.)  We  show  that  applying  the  trace  function  T rR/R  to  the  ^-plaintext  yields  a  new  plaintext  in  Sq 
that  noisily  encodes  the  message,  thus  isolating  the  meaningful  part  of  the  noisy  encoding  and  vanishing 
the  rest.  We  then  homomorphic  ally  apply  a  rounding  function  to  recover  the  Sv  message  from  its  noisy  Sq 
encoding,  which  uses  the  technique  described  next. 

Mapping  coefficients  to  slots.  Our  second  technique,  and  main  technical  innovation,  is  in  bootstrapping 
(semi-)packed  ciphertexts.  We  enhance  the  recent  “ring-switching”  procedure  of  HGHPS121.  and  use  it  to 
efficiently  move  “noisy”  plaintext  coefficients  (with  respect  to  an  appropriate  decryption  basis)  into  slots 
for  batch-rounding,  and  finally  move  the  rounded  slot  values  back  to  coefficients.  We  note  that  all  previous 
methods  for  loading  plaintext  data  into  slots  used  the  same  ring  for  the  source  and  destination,  and  so  required 
the  plaintext  to  come  from  a  ring  designed  to  have  many  slots.  In  this  work,  we  use  ring-switching  to  go  from 
the  SHE  plaintext  ring  to  a  different  ring  having  many  slots,  which  is  used  only  temporarily  for  batch-rounding. 
This  is  what  allows  the  SHE  plaintext  ring  to  be  decoupled  from  the  rings  used  in  bootstrapping,  as  mentioned 
above. 

To  summarize  our  technique,  we  first  recall  the  ring-switching  procedure  of  HGHPS121.  It  was  originally 
devised  to  provide  moderate  efficiency  gains  for  SHE/FHE  schemes,  by  allowing  them  to  switch  ciphertexts 
from  high-degree  cyclotomic  rings  to  subrings  of  smaller  degree  (once  enough  homomorphic  operations  have 
been  performed  to  make  this  secure).  We  generalize  the  procedure,  showing  how  to  switch  between  two 
rings  where  neither  ring  need  be  a  subring  of  the  other.  The  procedure  has  a  very  simple  implementation, 
and  as  long  as  the  two  rings  have  a  large  common  subring,  it  is  also  very  efficient  (i.e.,  quasilinear  in  the 
dimension).  Moreover,  it  supports,  as  a  side  effect,  the  homomorphic  evaluation  of  any  function  that  is  linear 
over  the  common  subring.  However,  the  larger  the  common  subring  is,  the  more  restrictive  this  condition  on 
the  function  becomes. 

We  show  how  our  enhanced  ring-switching  can  move  the  plaintext  coefficients  into  the  slots  of  the  target 
ring  (and  back),  which  can  be  seen  as  just  evaluating  a  certain  Z-linear  function.  Here  we  are  faced  with 
the  main  technical  challenge:  for  efficiency,  the  common  subring  of  the  source  and  destination  rings  must 
be  large,  but  then  the  supported  class  of  linear  functions  is  very  restrictive,  and  certainly  does  not  include 
the  Z-linear  one  we  want  to  evaluate.  We  solve  this  problem  by  switching  through  a  short  sequence  of 
“hybrid”  rings,  where  adjacent  rings  have  a  large  common  subring,  but  the  initial  and  final  rings  have  only 
the  integers  Z  in  common.  Moreover,  we  show  that  for  an  appropriately  chosen  sequence  of  hybrid  rings, 
the  Z-linear  function  we  want  to  evaluate  is  realizable  by  a  sequence  of  allowed  linear  functions  between 
adjacent  hybrid  rings.  Very  critically,  this  decomposition  requires  the  SHE  scheme  to  use  a  highly  structured 
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basis  of  the  ring  for  decryption.  The  usual  representation  of  a  cyclotomic  ring  as  Z[X]/<bm(X)  typically 
does  not  correspond  to  such  a  basis,  so  we  instead  rely  on  the  tensoricd  decomposition  of  the  ring  and  its 
corresponding  bases,  as  recently  explored  in  [LPR131.  At  heart,  this  is  what  allows  us  to  avoid  the  expensive 
homomorphic  reduction  modulo  (I>m(T),  which  is  one  of  the  main  bottlenecks  in  previous  work  UGHS 1 2aM 
Stepping  back  a  bit,  the  technique  of  switching  through  hybrid  rings  and  bases  is  reminiscent  of  standard 
“sparse  decompositions”  for  linear  transformations  like  the  FFT,  in  that  both  decompose  a  complicated  high¬ 
dimensional  transform  into  a  short  sequence  of  simpler,  structured  transforms.  (Here,  the  simple  transforms 
are  computed  merely  as  a  side-effect  of  passing  through  the  hybrid  rings.)  Because  of  these  similarities,  we 
believe  that  the  enhanced  ring-switching  procedure  will  be  applicable  in  other  domain-specific  applications 
of  homomorphic  encryption,  e.g.,  signal-processing  transforms  or  statistical  analysis. 


Organization.  Section  2. 1  recalls  the  extensive  algebraic  background  required  for  our  constructions,  and 
Section  |E2[  recalls  a  standard  ring-based  SHE  scheme  and  some  of  its  natural  homomorphic  operations. 
Section  [3]  defines  the  general  bootstrapping  procedure.  Sections  [4] and  [^respectively  fill  in  the  details  of  the 
two  novel  homomorphic  operations  used  in  the  bootstrapping  procedure.  Appendix  [A]  documents  a  folklore 
transformation  between  two  essentially  equivalent  ways  of  encoding  messages  in  SHE  schemes.  Appendix  [B] 
describes  an  integer  rounding  procedure  that  simplifies  the  one  given  in  UGHS  1 2 all,  and  Appendix |C| gives 
some  concrete  choices  of  rings  that  our  method  can  use  in  practice. 


Acknowledgments.  We  thank  Oded  Regev  for  helpful  discussions  during  the  early  stages  of  this  research, 
and  the  anonymous  CRYPTO’  13  reviewers  for  their  thoughtful  comments. 


2  Preliminaries 

For  a  positive  integer  k,  we  let  [k]  =  {0, . . . ,  k  —  1}.  For  an  integer  modulus  q,  we  let  7Lq  =  Z/qZ 
denote  the  quotient  ring  of  integers  modulo  q.  For  integers  q.  q' ,  we  define  the  integer  “rounding”  function 

L'l q'  ■  Z q  ->  z q,  as  [x\ q,  =  [ (q' /  q)  ■  x]  mod  q'. 

2.1  Algebraic  Background 

Throughout  this  work,  by  “ring”  we  mean  a  commutative  ring  with  identity.  For  two  rings  R  C  R! ,  an 
/(-basis  of  R'  is  a  set  B  C  R!  such  that  every  r  £  R'  can  be  written  uniquely  as  an  /(-linear  combination 
of  elements  of  B.  For  two  rings  R ,  S  with  a  common  subring  E,  an  E- linear  function  L:  R  — >■  .S'  is  one 
for  which  L(r  +  r')  =  L(r)  +  L(r')  for  all  r,  r'  e  R,  and  L(e  ■  r)  =  e  •  L(r)  for  all  e  E  E,  r  G  R.  It  is 
immediate  that  such  a  function  is  defined  uniquely  by  its  values  on  any  E-basis  of  R. 

2.1.1  Cyclotomic  Rings 

For  a  positive  integer  m  called  the  index ,  let  Orn  =  Z[£m]  denote  the  mth  cyclotomic  ring,  where  Qrn  is  an 
abstract  element  of  order  m  over  <Q>.  (In  particular,  we  do  not  view  C,m  as  any  particular  complex  root  of  unity.) 
The  minimal  polynomial  of  (m  over  O  is  the  mth  cyclotomic  polynomial  <l>m(Tj  =  YliG%*  (X-u4)gZ[X], 
where  ujm  =  exp(27T\/—  1  /rn)  G  C  is  the  principal  mth  complex  root  of  unity,  and  the  roots  ojlm  £  C  range 

2The  use  of  more  structured  representations  of  cyclotomic  rings  in  IILPR13!!  was  initially  motivated  by  the  desire  for  simpler  and 
more  efficient  algorithms  for  cryptographic  operations.  Interestingly,  these  representations  yield  moderate  efficiency  improvements 
for  computations  “in  the  clear,”  but  dramatic  benefits  for  their  homomorphic  counterparts! 
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over  all  the  primitive  complex  mth  roots  of  unity.  Therefore,  Orn  is  a  ring  extension  of  degree  n  =  pirn) 
over  Z.  (In  particular,  0\  =  O2  =  Z.)  Clearly,  Orn  is  isomorphic  to  the  polynomial  ring  Jj[X\/<&m(X)  by 
identifying  (m  with  X,  and  has  the  “power  basis”  {1,  £m . . . . ,  Cm-1}  as  a  Z-basis.  However,  for  non-prime- 
power  m  the  power  basis  can  be  somewhat  cumbersome  and  inefficient  to  work  with.  In  Section  2.1.4  we 
consider  other,  more  structured  bases  that  are  essential  to  our  techniques. 

If  m|m/,  we  can  view  the  mth  cyclotomic  ring  Orn  as  a  subring  of  Omi  =  Z[£m/],  via  the  ring  embedding 


(i.e.,  injective  ring  homomorphism)  that  maps  Qrn  to  £ 


m!  !m 


The  ring  extension  Omi/Om  has  degree 


d  =  p{mr) /pirn),  and  also  d  automorphisms  r,  (i.e.,  automorphisms  of  Om>  that  fix  Orn  pointwise), 
which  are  defined  by  Ti((mi)  =  Qn,  for  each  i  G  Z*(/  such  that  i  =  I  (mod  m).  The  trace  function 


Tr  =  Tr 


OmriOr 


:  Om'  — >■  Orn  can  be  defined  as  the  sum  of  these  automorphisms: 


Tr, 


'Om>/On 


(a)  =  ^Tj(a)  G  Or, 


Notice  that  Tr  is  C9r„-1  incar  by  definition.  If  Om" / Om> /Om  is  a  tower  of  ring  extensions,  then  the  trace 
satisfies  the  composition  property  TY0m„/0m  =  TV 0  Tr 
An  important  element  in  the  mth  cyclotomic  ring  is 


9  JJ  (1  —  Cp)  ^  Om 

odd  prime  p\m 


(2.1) 


Also  define  m  =  m/2  if  m  is  even,  otherwise  rh  =  m,  for  any  cyclotomic  index  m.  It  is  known  that  g\m 
(see,  e.g.,  ILPR13.  Section  2.5.4]).  The  following  lemma  shows  how  the  elements  g  in  different  cyclotomic 
rings,  and  the  ideals  they  generate,  are  related  by  the  trace  function. 

Lemma  2.1.  Let  m\m'  be  positive  integers  and  lei  g  E  I!  =  Om .  g'  G  R'  =  Omi  and  m,  rh'  be  as  defined 
above.  Then  Tr R’m^g'R')  =  (m! /rh)  ■  gR,  and  in  particular,  Tr^p^g')  =  (rh1  /rh)  ■  g. 

Later  on  we  use  the  scaled  trace  function  (m/m')  Tiy?/  iR,  which  by  the  above  lemma  maps  the  ideal  g'R 
to  gR,  and  g'  to  g. 

Proof.  Let  Tr  =  Tr^/ /R.  To  prove  the  first  claim,  we  briefly  recall  certain  properties  of  Rv,  the  fractional 
ideal  “dual”  to  R\  see  llLPR  1 3 1  Section  2.5.4]  for  further  details.  First,  R  J  =  (g/m)R,  and  similarly 
(R'Y  =  (g' /rh')R! .  It  also  follows  directly  from  the  definition  of  the  dual  ideal  that  Tr((f?')v)  =  Iff  see 
for  example  HGHPS12.  Equation  2.2].  Therefore,  Tr  (g'R')  =  (rh! /rh)  •  gR. 

For  the  second  claim,  we  first  show  the  effect  of  the  trace  on  g'  when  m!  =  m  ■  p  for  some  prime  p. 
If  p  divides  m,  then  rh' /rh  =  m' /m  =  p,  the  degree  of  R' / R  is  p(m') / p(m)  =  p,  and  g'  =  g  G  R,  so 
Tr(p')  =  Tr(p)  =  p  ■  g.  Now  suppose  p  does  not  divide  m.  If  p  =  2,  then  m  is  even  and  m'  is  odd,  so 
rh' /rh  =  (m' /2)/m  =  1,  the  degree  of  R! / R  is  1,  and  g'  =  g  G  R,  so  Tr(p')  =  g.  Otherwise  p  is  odd,  so 
rh' /rh  =  m! /m  =  p  and  g'  =  (1  —  (p)g.  Therefore  Tr(p')  =  Tr(l  —  C,p)  ■  g  =  p  •  g,  where  the  final  equality 
follows  from  Tr(l)  =  p  —  1  and  Tr(Cp)  =  CP  +  CP  4 ( p1  =  — 1. 

The  general  case  follows  from  the  composition  property  of  the  trace,  by  iteratively  applying  the  above 
case  to  any  cyclotomic  tower  R^  / R^^  /  ■  ■  ■  / Rl{)> ,  where  R('r!  =  R!  and  Rlu>  =  R,  and  the  ratio  of  the 
indices  of  R.H ,  Ri1-1^  is  prime  for  every  i  =  I .... ,  r.  □ 
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2.1.2  Tensorial  Decomposition  of  Cyclotomics 


An  important  fact  from  algebraic  number  theory,  used  centrally  in  this  work  (and  in  ILPR131I),  is  the  tensorial 
decomposition  of  cyclotomic  rings  (and  their  bases)  in  terms  of  subrings.  Let  Omi ,  Om2  be  cyclotomic 
rings.  Then  their  largest  common  subring  is  Orn ,  n  Om,2  =  Og  where  g  =  gcd(mi,  m2),  and  their  smallest 
common  extension  ring,  called  the  composition,  is  Omi  +  Orn.2  =  O/  where  l  =  lcm(mi,m2).  When 
considered  as  extensions  of  Og,  the  ring  Oi  is  isomorphic  to  the  ring  tensor  product  of  Om ,  and  Om.2 ,  written 
as  (sometimes  suppressing  Og  when  it  is  clear  from  context) 

Oi/Og^(pmi/Og)®{Om2/Og). 

On  the  right,  the  ring  tensor  product  is  defined  as  the  set  of  all  -linear  combinations  of  pure  tensors  a\  <S>  a  2, 
with  ring  operations  defined  by  09-bilinearity: 

(ai  <8>  a2)  +  (61  (8)  a2)  =  (a\  +  hi)  <8>  a2, 

(ai  (8>  a2)  +  (ai  <8>  62)  =  a±  tg>  (a2  +  b2), 

c(ai  <8>  a2)  =  (car)  <8>  a2  =  ai  <8>  (ca2) 

for  any  c  E  and  the  mixed-product  property  (a  1  <8>  a2)  •  (hi  <8>  h2)  =  (aihi)  <8>  (a262).  The  isomorphism 
with  Oi/Og  then  simply  identities  a  1  <8>  a2  with  ai  •  a2  E  Oi .  Note  that  any  a±  E  ,  corresponds  to  the 
pure  tensor  ai  <8>  1,  and  similarly  for  any  a2  E  Orn.2 . 

The  following  simple  lemma  will  be  central  to  our  techniques. 

Lemma  2.2.  Let  mi,  m2  be  positive  integers  and  g  =  gcd(mi,m2),  l  =  lcm(mi,m2).  Then  for  any 
O g-linear function  L :  Omi  Orm ,  there  is  an  ( efficiently  computable)  Om2-linear  function  L:  Oi  — >  Om2 
that  coincides  with  L  on  the  subring  Omi  C  Oj. 

Proof  Write  Oi  =  Orn ,  ®  Om2,  where  the  common  base  ring  On  is  implicit.  Let  L :  ( Om ,  <8>  Om2 )  — >  Orn,2 
be  the  09-linear  function  uniquely  defined  by  L{a\  ®  a2)  =  L(ai)  •  a2  E  Om2  for  all  pure  tensors  ai  <8>  a2. 
Then  because  (ai  <8>  a2)  •  h2  =  «  |  ®  (a2h2)  for  any  h2  E  0m,  by  the  mixed-product  property,  L  is  also 
C9m2 -linear.  Finally,  for  any  ai  E  Omi  we  have  L(a\  <8>  1)  =  L(o  \ )  by  construction.  □ 

2.1.3  Ideal  Factorization  and  Plaintext  Slots 

Here  we  recall  the  unique  factorization  of  prime  integers  into  prime  ideals  in  cyclotomic  rings,  and,  fol¬ 
lowing  ISVlll.  how  the  Chinese  remainder  theorem  can  yield  several  plaintext  “slots”  that  embed  7Lq  as  a 
subring,  even  for  composite  q.  Similar  facts  for  composite  moduli  are  presented  in  lGHS12al.  but  in  terms  of 
p-adic  approximations  and  Hensel  lifting.  Here  we  give  an  ideal-theoretic  interpretation  using  the  Chinese 
remainder  theorem,  which  we  believe  is  more  elementary,  and  is  a  direct  extension  of  the  case  of  prime 
moduli. 

Let  pE  Z  be  a  prime  integer.  In  the  mth  cyclotomic  ring  R  =  Om  =  Z[£m]  (which  has  degree  n  =  p{m) 
over  Z),  the  ideal  pR  factors  into  prime  ideals  as  follows.  First  write  m  =  m-pk  where  p\  fh.  Let  e  =  p(pk), 
and  let  d  be  the  multiplicative  order  of  p  modulo  in  Z)?(,  and  note  that  d  divides  p(m)  =  n/e.  The  ideal  pR 
then  factors  into  the  product  of  eth  powers  of  ip(fh)/d  =  n/ (de)  distinct  prime  ideals  pt,  i.e., 

pr = m 
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Each  prime  ideal  p,  has  norm  / s' / p,|  =  pd,  so  each  quotient  ring  R/pi  is  isomorphic  to  the  finite  field  Wp,i . 
In  particular,  it  embeds  Zp  as  a  subfield.  (Although  we  will  not  need  this,  the  prime  ideals  are  concretely 
given  by  p,  =  pR  +  Fi((m)R,  where  (P7-n{X)  =  ]~[;  Fi(X)  (mod  p)  is  the  mod-p  factorization  of  the  frith 
cyclotomic  polynomial  into  (p(rh)/d  distinct  iiTeducible  polynomials  of  degree  d.) 

We  now  see  how  to  obtain  quotient  rings  of  R  that  embed  the  ring  7Lq,  where  q  =  pr  for  some  integer  r  >  1. 
(The  case  of  arbitrary  integer  modulus  q  follows  immediately  from  the  Chinese  remainder  theorem.)  Here  we 
have  the  factorization  qR  =  ■  pp,  and  it  turns  out  that  each  quotient  ring  R/p\e  embeds  Zg  as  a  subring. 

One  easy  way  to  see  this  is  to  notice  that  q  is  the  smallest  power  of  p  in  pp,  so  the  integers  {0, —  1} 
representing  7Lq  are  distinct  modulo  pp. 

By  the  Chinese  Remainder  Theorem  (CRT),  for  q  =  pr  the  natural  ring  homomoiphism  from  Rq  to 
the  product  ring  (DP/ippp)  is  an  isomorphism.  When  the  natural  plaintext  space  of  a  cryptosystem  is  Rq, 
we  refer  to  the  Lp(m)/d  quotient  rings  R/ p[e  as  the  plaintext  “Zf;-slots”  (or  just  “slots”),  and  use  them  to 
store  vectors  of  Zg-elements  via  the  CRT  isomorphism.  With  this  encoding,  ring  operations  in  Rq  induce 
“batch”  (or  “SIMD”)  component-wise  operations  on  the  corresponding  vectors  of  7Lq  elements.  We  note 
that  the  CRT  isomorphism  is  easy  to  compute  in  both  directions.  In  particular,  to  map  from  a  vector  of 
Zq-elements  to  Rq  just  requires  knowing  a  fixed  mod-g  CRT  set  C  =  { c, }  C  R  for  which  c,  =  1  (mod  pp) 
and  a  =  0  (mod  pp: )  for  all  j  p  i.  Such  a  set  can  be  precomputed  using,  e.g.,  a  generalization  of  the 
extended  Euclidean  algorithm. 

Splitting  in  cyclotomic  extension  rings.  Now  consider  a  cyclotomic  extension  R'  / R  where  R’  =  Oin>  = 
Z \(m']  for  some  m!  divisible  by  m.  Then  for  each  prime  ideal  p,  c  R  dividing  pR,  the  ideal  piR'  factors  into 
equal  powers  of  the  same  number  of  prime  ideals  p' t,  C  R! ,  where  all  the  pj  are  distinct.  The  ideal  p' p 
is  said  to  “lie  over”  p,  (and  pi  in  turns  lies  over  p).  Since  p(  are  also  the  prime  ideals  appearing  in  the 
factorization  pR! ,  we  can  determine  their  number  and  multiplicity  exactly  as  above.  Letting  fh',  F  and  d! 
be  defined  as  above  for  R! ,  we  known  that  pR!  =  imp;,  j,)e  ,  where  there  are  a  total  of  ip(m')/d'  distinct 
prime  ideals  pj  i,.  Therefore,  each  pi  splits  into  exactly  (cp(mr)  ■  d)/{tp{m)  ■  d')  ideals  each;  this  number  is 
sometimes  called  the  “relative  splitting  number”  of  p  in  R!  / R. 

2.1.4  Product  Bases 

Our  bootstrapping  technique  relies  crucially  on  certain  highly  structured  bases  and  CRT  sets,  which  we  call 
“product  bases  (sets),”  that  arise  from  towers  of  cyclotomic  rings.  Let  Om"  / Orn/ / Orn  be  such  a  tower,  let 
B "  =  {bj„}  C  Omn  be  any  0m/-basis  of  Omu,  and  let  B'  =  {6' , }  C  Omi  be  any  0m-basis  of  Omt.  Then  it 
follows  immediately  that  the  product  set  B"  ■  B'  :=  { //'„  •  //., }  c  Orn"  is  an  (Tm-basis  of  f9)r(//pl  Of  course, 
for  a  tower  of  several  cyclotomic  extensions  and  relative  bases,  we  can  obtain  product  bases  that  factor  with  a 
corresponding  degree  of  granularity. 

Factorization  of  the  powerful  and  decoding  bases.  An  important  structured  Z-basis  of  Om,  called 
the  “powerful”  basis  in  MLPR13L  was  defined  in  that  work  as  the  product  of  all  the  power  Z-bases 
{c°,  C\  -  •  !~ 1 }  °f  F)v<  (where  Q  =  Cp')-  taken  over  all  the  maximal  prime-power  divisors  pe  of  m. 

In  turn,  it  is  straightforward  to  verify  that  the  power  Z-basis  of  Ope  can  be  obtained  from  the  tower 
Ope/Ope- 1  /  •  •  •  /Z,  as  the  product  of  all  the  power  Opl- 1  -bases  { . . . . ,  1 }  of  Op/  for  i  =  1, . . . ,  e, 

where  di  =  p(jf  )/p(p1^ 1 )  G  {p  —  l.  p\  is  the  degree  of  Of),  / Op,.-i .  Therefore,  the  powerful  basis  has  a 

'Formally,  this  basis  is  a  Kronecker  product  of  the  bases  B"  and  B' ,  which  is  typically  written  using  the  ®  operator.  We  instead 
use  •  to  avoid  confusion  with  pure  tensors  in  a  ring  tensor  product,  which  the  elements  of  B"  ■  B'  may  not  necessarily  be. 
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“finest  possible”  product  structure.  (This  is  not  the  case  for  other  commonly  used  bases  of  Om,  such  as  the 
power  Z-basis,  unless  m  is  a  prime  power.) 

Similarly,  llLPR  1 3ll  defines  the  “decoding”  Z-basis  D  of  a  certain  fractional  ideal  CU  =  ( g/rh)Om , 
which  is  the  “dual  ideal”  of  Om,  to  be  the  dual  basis  of  the  conjugate  powerful  basis.  Unlike  the  powerful  basis, 
the  decoding  basis  has  optimal  noise  tolerance  (see  HLPR131  Section  6.2])  and  is  therefore  a  best  choice  to  use 
in  decryption,  when  using  the  dual  ideal  0:fn  appropriately  in  a  cryptosystem.  For  simplicity,  our  formulation 
of  the  cryptosystem  (see  Section  2.2 1  avoids  using  0Jm  by  “scaling  up”  to  (m/fj)07Jn  =  Om,  and  so  we  are 
interested  in  factorizations  of  the  scaled-up  Z-basis  ( m/g)D  of  Om.  As  shown  in  llLPR  1 3  Lemma  6.3],  this 
basis  is  very  closely  related  to  the  powerful  basis,  and  has  a  nearly  identical  product  structure  arising  from  the 
/Z  for  the  maximal  prime -power  divisors  jf  of  m.  The  only  difference  is  in  the  choice 

j  /_J+1  L  C p  2}je{0,...,p— 2}  instead 

of  the  power  basis.  In  summary,  the  preferred  Z-basis  of  Om  used  for  decryption  also  has  a  finest-possible 
product  structure. 


towers  Ope / Ope-i  / 

of  the  lowest-level  Z-bases  of  each  Opj7L ,  which  are  taken  to  be  { Qp  +  QO'  +  •  • 


Factorization  of  CRT  sets.  Using  the  splitting  behavior  of  primes  and  prime  ideals,  we  can  also  define 
CRT  sets  having  a  finest-possible  product  structure.  First  consider  any  cyclotomic  extension  Omi /Om,  and 
suppose  that  prime  integer  p  splits  in  Om  into  distinct  prime  ideals  p,;.  In  turn,  each  pi  splits  in  Om>  into  the 
same  number  k  of  prime  ideals  p'  v,  which  are  all  distinct.  For  simplicity,  assume  for  now  that  p  does  not 
divide  m  or  m',  so  none  of  the  ideals  occur  with  multiplicity. 

A  mod-p  CRT  set  C  =  { c, }  for  Orn  satisfies  c,  =  1  (mod  pi)  and  q  =  0  (mod  p;)  for  j  ^  i;  therefore, 
q  =  I  (mod  p'  p )  and  q  =  0  (mod  p'-  {,)  for  all  i'  and  all  j  ^  We  can  choose  a  set  S  =  }  C  Om< 

of  size  k  such  that  C'  =  S  ■  C  is  a  mod-p  CRT  set  for  Omi,  as  follows:  partition  the  ideals  p(  arbitrarily 
according  to  %' ,  and  define  sy  G  Om>  to  be  congruent  to  1  modulo  all  those  ideals  p(  p  in  the  i'th  component 
of  the  partition,  and  0  modulo  all  the  other  ideals  p'-  ■, .  Then  it  is  immediate  that  each  product  a  ■  s v 
is  1  modulo  p(  and  0  modulo  all  other  p(  Therefore,  C'  =  S  ■  C  is  a  mod-p  CRT  set  for  Om>.  The 

'Ji "  J  ij 

generalization  of  this  process  to  the  case  where  p  factors  into  powers  of  the  ideals,  and  to  moduli  q  =  pr,  is 
immediate. 

For  an  arbitrary  cyclotomic  index  m,  consider  any  cyclotomic  tower  Om/  ■  ■  ■  /Z.  Then  a  mod-g  CRT  set 
with  corresponding  product  structure  can  be  obtained  by  iteratively  applying  the  above  procedure  at  each 
level  of  the  tower.  A  finest-possible  product  structure  is  obtained  by  using  tower  of  maximal  length  (i.e.,  one 
in  which  the  ratio  of  indices  at  adjacent  levels  is  always  prime). 

2.2  Ring-Based  Homomorphic  Cryptosystem 

Here  we  recall  a  somewhat-homomorphic  encryption  scheme  whose  security  is  based  on  the  ring-LWE 
problem  HLPR101I  in  arbitrary  cyclotomic  rings.  For  our  purposes  we  focus  mainly  on  its  decryption  function, 
though  below  we  also  recall  its  support  for  “ring  switching”  [GHPS121.  For  further  details  on  its  security 
guarantees,  various  homomorphic  properties,  and  efficient  implementation,  see  1LPR101  |BV  1  lbl  IBGV12I 
IGHS 1 2d  IGHPST21 ILPR 1 3l . 

Let  R  =  Om  C  R!  =  Orni  be  respectively  the  mth  and  m/th  cyclotomic  rings,  where  m|m/.  The 
plaintext  ring  is  the  quotient  ring  Rp  for  some  integer  p\  ciphertexts  are  made  up  of  elements  of  R!q  for  some 
integer  q,  which  for  simplicity  we  assume  is  divisible  by  p\  and  the  secret  key  is  some  s  G  R! .  The  case 
m  =  1  corresponds  to  “non-packed”  ciphertexts,  which  encrypt  elements  of  Zp  (e.g.,  single  bits),  whereas 
m  =  rn'  corresponds  to  “packed”  ciphertexts,  and  1  <  rn  <  rn'  corresponds  to  what  we  call  “semi-packed” 
ciphertexts.  Note  that  without  loss  of  generality  we  can  treat  any  ciphertext  as  packed,  since  R'p  embeds  Rp. 
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But  the  smaller  m  is,  the  simpler  and  more  practically  efficient  our  bootstrapping  procedure  can  be.  Since 
our  focus  is  on  refreshing  ciphertexts  that  have  large  noise  rate,  we  can  think  of  m'  as  being  somewhat  small 
(e.g.,  in  the  several  hundreds)  via  ring-switching  HGHPS1 2]|.  and  q  also  as  being  somewhat  small  (e.g.,  in  the 
several  thousands)  via  modulus-switching.  Our  main  focus  in  this  work  is  on  a  plaintext  modulus  p  that  is  a 
power  of  two,  though  for  generality  we  present  all  our  techniques  in  terms  of  arbitrary  p. 

A  ciphertext  encrypting  a  message  p  G  Rp  under  secret  key  s'  G  R!  is  some  pair  d  =  (cq,  cj  )  G  R'q  x  R'q 
satisfying  the  relation 


Cq  Ci  ■  S 


Q 

p 


■  p  +  d 


(mod  qR') 


(2.2) 


for  some  error  (or  “noise”)  term  d  G  R!  such  that  d  ■  g'  G  g' R!  is  sufficiently  “short,”  where  g'  G  R!  is  as 
defined  in  Equation  (|2.1[)P1  Informally,  the  “noise  rate”  of  the  ciphertext  is  the  ratio  of  the  “size”  of  d  (or 
more  precisely,  the  magnitude  of  its  coefficients  in  a  suitable  basis)  to  q/p. 

We  note  that  Equation  (|2.2[)  corresponds  to  what  is  sometimes  called  the  “most  significant  bit”  (msb) 
message  encoding,  whereas  somewhat-homomorphic  schemes  are  often  defined  using  “least  significant 
bit”  (lsb)  encoding,  in  which  p  and  q  are  coprime  and  Cq  +  c\  s'  =  d  (mod  qR')  for  some  error  term 
d  G  p  +  pR! .  For  our  purposes  the  msb  encoding  is  more  natural,  and  in  any  case  the  two  encodings  are 
essentially  equivalent:  when  p  and  q  are  coprime,  we  can  trivially  switch  between  the  two  encodings  simply  by 
multiplying  by  p  or  p-1  modulo  q  (see  Appendix[A]l.  When  p  divides  q,  we  can  use  homomorphic  operations 
for  the  msb  encoding  due  to  Brakerski  IIBral  211 ;  alternatively,  we  can  switch  to  and  from  a  different  modulus 
q'  that  is  coprime  with  p,  allowing  us  to  switch  between  lsb  and  msb  encodings  as  just  described.  In  practice, 
it  may  be  preferable  to  use  homomorphic  operations  for  the  lsb  encoding,  because  they  admit  optimizations 
(e.g.,  the  “double-CRT  representation”  IIGHS 1 2cll  )  that  may  not  be  possible  for  the  msb  operations  (at  least 
when  p  divides  q). 


2.2.1  Decryption 


At  a  high  level,  the  decryption  algorithm  works  in  two  steps:  the  “linear”  step  simply  computes  v'  = 
dQ  +  dx  ■  s'  =  |  •  p  +  d  G  R'q,  and  the  “non-linear”  step  outputs  [v'~\p  G  Rp  using  a  certain  “ring  rounding 
function”  L-]p :  R'q  — >■  Rp-  As  long  as  the  error  term  d  is  within  the  tolerance  of  the  rounding  function,  the 
output  will  be  p  G  Rp.  This  is  all  entirely  analogous  to  decryption  in  LWE-based  systems,  but  here  the 
rounding  is  n-dimensional,  rather  than  just  from  Zg  to  Zp. 

Concretely,  the  ring  rounding  function  |_-~|p :  R'q  — >  Rp  is  defined  in  terms  of  the  integer  rounding  function 
[■]  :  TLq  — »  Zp  and  a  certain  “decryption”  Z-basis  B'  =  { b;) }  of  R' ,  as  followsrl  Represent  the  input  v'  G  R'q 
in  the  decryption  basis  as  v'  =  Ylj  v'j  '  f°r  some  coefficients  v'q  G  Z9,  then  independently  round  the 
coefficients,  yielding  an  element  ^ Wj\P  '  ^  £  R'p  that  corresponds  to  the  message  p  G  R,p  (under  the 
standard  embedding  of  Rp  into  R'p). 


4 Quantitatively,  “short”  is  defined  with  respect  to  the  canonical  embedding  of  R! ,  whose  precise  definition  is  not  needed  in  this 
work.  The  above  system  is  equivalent  to  the  one  from  ILPR13I  in  which  the  message,  error  term,  and  ciphertext  components  are  all 
taken  over  the  “dual”  fractional  ideal  ( R')y  =  (</ /m')R'  in  the  m'th  cyclotomic  number  field,  and  the  error  term  has  an  essentially 
spherical  distribution  over  (R')v ■  In  that  system,  decryption  is  best  accomplished  using  a  certain  Z-basis  of  (R')v ,  called  the 
decoding  basis ,  which  optimally  decodes  spherical  errors.  The  above  formulation  is  more  convenient  for  our  purposes,  and  simply 
corresponds  with  multiplying  everything  in  the  system  of  ILPR13I  by  an  rh! / g'  factor.  This  makes  e'  ■  g'  €  g' R!  =  rh'{R')w)  short 

and  essentially  spherical  in  our  formulation.  See  ILPR10tlLPR13l  for  further  details.  _ 

\  and  Footnote  |4] 


5In  our  formulation,  the  basis  B'  is  (rh' /g1)  times  the  decoding  basis  of  (7?')v .  See  Section 


2.1.4 
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2.2.2  Changing  the  Plaintext  Modulus 


We  use  two  operations  on  ciphertexts  that  alter  the  plaintext  modulus  p  and  encrypted  message  p  E  Rp.  The 
first  operation  changes  p  to  any  multiple  p'  =  dp,  and  produces  an  encryption  of  some  p'  E  R'p,  such  that 
p!  =  p  (mod  pR').  To  do  this,  it  simply  “lifts”  the  input  ciphertext  d  =  (eg,  c\  )  E  (R'q)2  to  an  arbitrary 
c"  =  (eg,  df)  E  ( R'ql )2  such  that  d-  =  c'-  (mod  gf?'),  where  q'  =  dq.  This  works  because 

c0  +  ci  '  s'  e  c0  +  ci  '  s'  +  g^  =  ( ~  ■  P  +  e/N)  +  qR1  =  —.  {p  +  pR')  +  e1  (mod  qR'). 

\p  /  p 

Notice  that  this  leaves  the  noise  rate  unchanged,  because  the  noise  term  is  still  d,  and  q'  /p'  =  q/p. 

The  second  operation  applies  to  an  encryption  of  a  message  p  E  Rp  that  is  known  to  be  divisible  by  some 
divisor  d  of  p,  and  produces  an  encryption  of  p/d  E  Rp/d-  The  operation  actually  leaves  the  ciphertext  d 
unchanged;  it  just  declares  the  associated  plaintext  modulus  to  be  p/d  (which  affects  how  decryption  is 
performed).  This  works  because 


c'o  +  cj  ■  s'  =  -p  +  e'  =  ■  {p/d)  +  d 

p  p/d 


(mod  qR'). 


Notice  that  the  noise  rate  of  the  ciphertext  has  been  divided  by  d,  because  the  noise  term  is  still  d  but 

q/p'  =  d(q/p). 


2.2.3  Ring  Switching 

We  rely  heavily  on  the  cryptosystem’s  support  for  switching  ciphertexts  to  a  cyclotomic  subring  S'  of  R', 
which  as  a  side-effect  homomorphically  evaluates  any  desired  S'-linear  function  on  the  plaintext.  Notice 
that  the  linear  function  L  is  applied  to  the  plaintext  as  embedded  in  R’p,  this  obviously  applies  the  induced 
function  on  the  true  plaintext  space  Rp. 

Proposition  2.3  1HGHPS121.  full  version).  Let  S'  C  R!  be  cyclotomic  rings.  Then  the  above-described 
cryptosystem  supports  the  following  homomorphic  operation:  given  any  S' -linear  function  L  :  Rp  — »  S'p 
and  a  ciphertext  over  R'q  encrypting  (with  sufficiently  small  error  term )  a  message  p  E  R'p,  the  output  is  a 
ciphertext  over  S'q  encrypting  L(p)  E  S'p. 

The  security  of  the  procedure  described  in  Proposition  |2. 3 1 is  based  on  the  hardness  of  the  ring-LWE 
problem  in  S',  so  the  dimension  of  S'  must  be  sufficiently  large.  The  procedure  itself  is  quite  simple  and 
efficient:  it  first  switches  to  a  secret  key  that  lies  in  the  subring  S',  then  it  multiplies  the  resulting  ciphertext 
by  an  appropriate  fixed  element  of  R!  (which  is  determined  solely  by  the  function  L).  Finally,  it  applies  to  the 
ciphertext  the  trace  function  T 'tr'/s1  ■  R'  —>  S'.  All  of  these  operations  are  quasi-linear  time  in  the  dimension 
of  R’ /Z,  and  very  efficient  in  practice.  In  particular,  the  trace  is  a  trivial  linear-time  operation  when  elements 
are  represented  in  any  of  the  bases  we  use.  The  ring-switching  procedure  increases  the  effective  error  rate  of 
the  ciphertext  by  a  factor  of  about  the  square  root  of  the  dimension  of  R',  which  is  comparable  to  that  of  a 
single  homomoiphic  multiplication.  See  [GHPS121  for  further  details. 


3  Overview  of  Bootstrapping  Procedure 

Here  we  give  a  high-level  description  of  our  bootstrapping  procedure.  We  present  a  unified  procedure  for 


(and  possibly |TcJ)  are  null  operations,  while  for  packed  ciphertexts,  Steps[lbj[TcJ  and[2]are  null  operations. 


non-packed,  packed,  and  semi-packed  ciphertexts,  but  note  that  for  non-packed  ciphertexts,  Steps and  3c 
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Recalling  the  cryptosystem  from  Section 


2.2 


the  plaintext  ring  is  Rp  and  the  ciphertext  ring  is  R'q,  where 
R  =  Om  C  R!  =  Orni  are  cyclotomic  rings  (so  m\m'),  and  q  is  a  power  of  p.  The  procedure  also  uses  a 
larger  cyclotomic  ring  R"  =  Omn  D  R!  (so  m'\m")  to  work  with  ciphertexts  that  encrypt  elements  of  the 
original  ciphertext  ring  R'q.  To  obtain  quasilinear  runtimes  and  exponential  hardness  (under  standard  hardness 
assumptions),  our  procedure  imposes  some  mild  conditions  on  the  indices  m,  m! ,  and  m": 


•  The  dimension  ip(m")  of  R"  must  be  quasilinear,  so  we  can  represent  elements  of  R"  efficiently. 

•  For  Steps [2] and [3]  all  the  prime  divisors  of  m  and  m'  must  be  small  (i.e.,  polylogarithmic). 

•  For  Step  [3]  m  and  m" /m  must  be  coprime,  which  implies  that  m  and  m! /m  must  be  coprime  also. 
Note  that  the  former  condition  is  always  satisfied  for  non-packed  ciphertexts  (where  m  =  1).  For 
packed  ciphertexts  (where  m  =  m'),  the  latter  condition  is  always  satisfied,  which  makes  it  easy 
to  choose  a  valid  m" .  For  semi-packed  ciphertexts  (where  1  <  m  <  m'),  we  can  always  satisfy 
the  latter  condition  either  by  increasing  m  (at  a  small  expense  in  practical  efficiency  in  Step  [3}  see 
Section[5.1.3[),  or  by  effectively  decreasing  m  slightly  (at  a  possible  improvement  in  practical  efficiency; 
see  Section [T2]). 

For  example,  when  m  =  1,  both  m!  and  m"  can  be  powers  of  two. 

The  input  to  the  procedure  is  a  ciphertext  d  =  (c'0,  c\ )  G  (R'q)2  that  encrypts  some  plaintext  //  G  Rp 
under  a  secret  key  s'  G  R',  i.e.,  it  satisfies  the  relation 


/  /  ,  /  / 
V  =  c0  +  cx  •  s 


q 

p 


■  p  +  d  (mod  qR') 


for  some  small  enough  error  term  d  G  R'.  The  procedure  computes  a  new  encryption  of  | _v'~\p  =  ft  (under 
some  secret  key,  not  necessarily  s')  that  has  substantially  smaller  noise  rate  than  the  input  ciphertext.  It 
proceeds  as  follows  (explanatory  remarks  appear  in  italics): 

1.  Convert  d  to  a  “noiseless”  ciphertext  c"  over  a  large  ring  R'q  that  encrypts  a  plaintext  (g1  / g)u'  G  R'ql, 
where  g'  G  R' .  g  G  R  and  fa.  rh'  G  Z  are  as  defined  in  (and  following)  Equation  (|2. 1  [),  q'  =  [rh! /rh)q, 
and  v!  =  v'  (mod  qR').  This  proceeds  in  the  following  sub-steps  (see  Section|3T|for  further  details). 

Note  that  g' / g  G  R'  by  definition,  and  that  it  divides  m' /rh. 


(a)  Reinterpret  d  as  a  noiseless  encryption  of  d  =  •  //  +  d  G  R'q  as  a  plaintext,  noting  that  both 

the  plaintext  and  ciphertext  rings  are  now  taken  to  be  R'q. 

This  is  purely  a  conceptual  change  in  perspective,  and  does  not  involve  any  computation. 


(b)  Using  the  procedure  described  in  Section  2.2.2|  change  the  plaintext  (and  ciphertext)  modulus  to 
q'  =  (m' /m)q,  yielding  a  noiseless  encryption  of  some  v!  G  R'q,  such  that  u'  =  d  (mod  qR'). 
Note  that  this  step  is  a  null  operation  if  the  original  ciphertext  was  packed,  i.e.,  if  m  =  m! . 

We  need  to  increase  the  plaintext  modulus  because  homomorphic  ally  computing  Tr/y/ /R  in  Step  [2] 
below  introduces  an  vn! /rh  factor  into  the  plaintext,  which  we  will  undo  by  scaling  the  plaintext 
modulus  back  down  to  q.  ( See  Section  3.2  for  an  alternative  choice  of  q' .) 


(c)  Multiply  the  ciphertext  from  the  previous  step  by  g’ / g  G  R! ,  yielding  a  noiseless  encryption  of 
plaintext  (</ / g)u'  G  R'ql. 

The  factor  (g'  /  g)  G  R'  is  needed  when  we  homomorphically  compute  Tr  r{i /R  in  Step^below. 
Note  that  g' / g  =  I  if  and  only  if  every  odd  prime  divisor  ofm!  also  divides  m,  e.g.,  ifm  =  m'. 
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(d)  Convert  to  a  noiseless  ciphertext  c"  that  still  encrypts  {g' / g)u'  £  R'q, ,  but  using  a  large  enough 
ciphertext  ring  R'q  for  some  R"  =  Om»  £>  R!  and  modulus  Q  2>  q' . 

A  larger  ciphertext  ring  Rq  is  needed  for  security  in  the  upcoming  homomorphic  operations,  to 
compensate  for  the  low  noise  rates  that  will  need  to  be  used.  These  operations  will  expand  the 
initial  noise  rate  by  a  quasipolynomial  X°^ogX^  factor  in  total,  so  the  dimension  of  R"  and  the 
bit  length  of  Q  can  be  0( A)  and  0(1),  respectively. 


The  remaining  steps  are  described  here  only  in  terms  of  their  effect  on  the  plaintext  value  and  ring.  Using 
ring-  and  modulus-switching,  the  ciphertext  ring  R"  and  modulus  Q  may  be  made  smaller  as  is  convenient, 
subject  to  the  security  and  functionality  requirements.  (Also,  the  ciphertext  ring  implicitly  changes  during 
Steps  [3a|  and  |3c|) 


2.  Homomorphically  apply  the  scaled  trace  function  (m/m')  Tr Rt /R  to  the  encryption  of  (</ / g)u'  £  R'q,, 
to  obtain  an  encryption  of  plaintext 

rh  „  /  q'  ,\  q  „ 

u  =  —  ■  TrR,/R  (—■u)  =  -  ■  p  +  e€  Rq 
m'  '  V  g  J  p 

for  some  suitably  small  error  term  e  £  R.  See  Section[4]for  further  details. 

This  step  changes  the  plaintext  ring  from  R'q,  to  Rq,  and  homomorphically  isolates  the  noisy  Rq- 
encoding  of  p.  It  is  a  null  operation  if  the  original  ciphertext  was  packed,  i.  e. ,  if  m  =  rn' . 


3.  Homomorphically  apply  the  ring  rounding  function  |_-]p :  Rq  — >  Rp,  yielding  an  output  ciphertext  that 
encrypts  |_ u]  =  p  €  Rp.  This  proceeds  in  three  sub-steps,  all  of  which  are  applied  homomorphically 
(see  Section p] for  details): 


(a)  Map  the  coefficients  Uj  of  u  £  Rq  (with  respect  to  the  decryption  basis  B  of  II)  to  the  Zg-slots 
of  a  ring  Sq,  where  S  is  a  suitably  chosen  cyclotomic. 

This  step  changes  the  plaintext  ring  from  Rq  to  Sq.  It  is  a  null  operation  if  the  original  ciphertext 
was  non-packed  (i.e.,  ifm  =  1),  because  we  can  let  S  =  II  =  Z. 


(b)  Batch-apply  the  integer  rounding  function  [•] :  Zg  Zp  to  the  Zq-slots  of  Sq,  yielding  a 
ciphertext  that  encrypts  the  values  pj  =  [ uf\p  £  Zp  in  its  Zp-slots. 

This  step  changes  the  plaintext  ring  from  Sq  to  Sp.  It  constitutes  the  only  non-linear  operation  on 
the  plaintext,  with  multiplicative  depth  [lgp]  ■  (log p(q)  —  1)  ~  log (q),  and  as  such  is  the  most 
expensive  in  terms  of  runtime,  noise  expansion,  etc. 


(c)  Reverse  the  map  from  the  step  [3aj  sending  the  values  pj  from  the  Zp-slots  of  Sp  to  coefficients 
with  respect  to  the  decryption  basis  B  of  Rp,  yielding  an  encryption  of  p  =  ST  /  pjbj  £  Rp. 

This  step  changes  the  plaintext  ring  from  Sp  to  Rp.  Just  like  step  3a  it  is  a  null  operation  for 
non-packed  ciphertexts. 


3.1  Obtaining  a  Noiseless  Ciphertext 


Step  [I]  of  our  bootstrapping  procedure  is  given  as  input  a  ciphertext  d  =  (c'{),  df)  over  R'q  that  encrypts 
(typically  with  a  high  noise  rate)  a  message  p  £  Rp  under  key  s'  £  R' ,  i.e.,  v'  =  dQ  +  dx  -  s'  =  'jp  p  +  d  £  R’q 
for  some  error  term  d .  We  first  change  our  perspective  and  view  d  as  a  “noiseless”  encryption  (still  under  s') 
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of  the  plaintext  value  v'  €  R'q,  taking  both  the  plaintext  and  ciphertext  rings  to  be  R'q.  This  view  is  indeed 
formally  correct,  because 

Cq  +  c\  ■  s'  =  -  ■  v'  +  0  (mod  qR'). 

Next,  in  preparation  for  the  upcoming  homomorphic  operations  we  increase  the  plaintext  (and  ciphertext) 
modulus  to  q’ ,  and  multiply  the  resulting  ciphertext  by  g' / g.  These  operations  clearly  preserve  noiselessness. 
Finally,  we  convert  the  ciphertext  ring  to  R'q  for  a  sufficiently  large  cyclotomic  R"  D  R!  and  modulus  Q  2>  q 
that  is  divisible  by  q.  This  is  done  by  simply  embedding  R'  into  R"  and  introducing  extra  precision,  i.e., 
scaling  the  ciphertext  up  by  a  Q/q  factor.  It  is  easy  to  verify  that  these  operations  also  preserve  noiselessness. 


3.2  Variants  and  Optimizations 

Our  basic  procedure  admits  a  few  minor  variants  and  practical  optimizations,  which  we  discuss  here. 

Smaller  temporary  modulus  q'.  In  Step  [Tb]  we  increase  the  plaintext  modulus  from  q  to  q'  =  rq  where 
r  =  m! /rh,  and  at  the  end  of  Step[2]we  reduce  the  modulus  back  to  q  because  the  plaintext  is  divisible  by  r. 
The  net  effect  of  this,  versus  using  a  modulus  q  throughout,  is  that  the  modulus  Q  is  larger  by  an  r  factor,  as 
are  the  error  rates  used  for  key-switching  in  Step [2}  This  does  not  affect  the  asymptotic  cost  of  bootstrapping, 
but  it  may  have  a  small  impact  in  practice.  Instead,  we  can  increase  the  modulus  to  only  q'  =  ( r/d)q ,  where  d 
is  the  largest  divisor  of  r  coprime  with  q.  Then  in  Step[2]we  can  remove  an  (r/d)  factor  from  the  plaintext  by 
scaling  the  modulus  back  down  to  q,  and  keep  track  of  the  remaining  d  factor  and  remove  it  upon  decryption. 
(We  could  also  remove  the  d  factor  by  multiplying  the  ciphertext  by  d~ 1  mod  q,  but  this  would  increase  the 
noise  rate  by  up  to  a  q/2  factor,  which  is  typically  much  larger  than  the  m! /rh  factor  we  were  trying  to  avoid 
in  the  first  place.) 


Using  a  smaller  index  m  in  Steps  [2]  and  [3j  Steps  [3a]  and  [3c]  can  be  much  more  costly  in  practice  than 
Step [2j  because  they  require  working  with  rings  that  have  at  least  (p(rn)  Z^-slots.  As  the  number  of  needed 
slots  increases,  the  indices  of  such  rings  tend  to  grow  quickly,  and  involve  more  prime  divisors  of  larger 
size  (though  asymptotically  the  indices  remain  quasilinear);  see  Appendix |C| for  some  examples.  So,  in 
practice  it  may  be  faster  to  invoke  Step[3]fl/<?w  times  to  evaluate  the  rounding  function  over  a  smaller  ring 
R  =  Ofh  C  R,  for  some  proper  divisor  rh  of  m.  Our  procedure  can  be  adapted  to  work  in  this  way,  even  if 
the  original  plaintext  //  is  an  arbitrary  element  of  the  plaintext  space  Rp. 

The  main  facts  we  use  are  that  the  decryption  basis  B  of  R  factors  as  B  =  B'  ■  B,  where  B  is  the 
decryption  basis  of  R,  and  in  particular  B'  is  an  optimally  short  A-basis  of  R.  (See  Section  2.1.4)  Moreover, 


applying  the  ring  rounding  function  on  any  u  £  Rq  is  equivalent  to  independently  applying  the  ring  rounding 
function  on  each  of  u’s  ^-coefficients  with  respect  to  B'.  Lastly,  the  ^-coefficients  of  u  can  be  individually 
extracted  using  the  trace  function  Tr^^  on  certain  fixed  (short)  multiples  of  u.  (This  all  just  generalizes  the 
case  R  =  Z  in  the  natural  way.)  Using  these  facts,  in  Stepj2jwe  can  homomorphically  apply  Tr^,  ,^,  several 

times  to  obtain  encryptions  of  the  /(,,-cocfficicnts  of  the  noisy  encoding  u  ~  (q/p)  ■  //,  then  use  Step  [3]to 
homomorphically  round  those  coefficients  to  get  the  /(,, -coefficients  of'//  £  Rp,  and  finally  reassemble  the 
pieces  by  homomorphically  multiplying  by  the  short  basis  elements  in  B' ,  and  summing  the  results. 

Note  that  the  above  method  requires  evaluating  Trfi^  a  total  of  p(jn)  /  p{m)  times  in  Step |2j  and  the 
same  goes  for  the  Rq  rounding  function  in  Step [3]  Because  each  evaluation  takes  quasilinear  time  no  matter 
what  m  is,  the  asymptotic  performance  can  only  worsen  as  rh  decreases.  However,  in  practice  there  may  be 
benefits  in  choosing  rh  to  be  slightly  smaller  than  m. 
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4  Homomorphic  Trace 

Here  we  show  how  to  perform  Step  [2]of  our  bootstrapping  procedure,  which  homomorphically  evaluates  the 
scaled  trace  function  ( rh/rh ')  Tr  ri, /R  on  an  encryption  of  (g'  / g)u'  G  R1  „  where  recall  that:  g'  G  R1 ,  g  G  R 
are  as  defined  in  Equation  (|2.1[),  and  (g'/g)  divides  (m! /m);  the  plaintext  modulus  is  q'  =  (rh! /rh)q\  and 

u'  =  v'  =  -  •  g  +  e'  (mod  qR'), 

P 

where  e'  ■  g'  G  g'R'  is  sufficiently  short.  Our  goal  is  to  show  that: 

1.  the  scaled  trace  of  the  plaintext  (g'  /  g)u'  is  some  u  =  |  •  /i  -f-  e  G  Rq,  where  e  •  g  G  gR  is  short,  and 

2.  we  can  efficiently  homomorphically  apply  the  scaled  trace  on  a  ciphertext  c"  over  some  larger  ring 
R"  =  Omu  D  R!. 


4.1  Trace  of  the  Plaintext 


We  first  show  the  effect  of  the  scaled  trace  on  the  plaintext  (g'  / g)v!  G  R' By  the  above  description  of 
v!  G  R'q,  and  the  fact  that  (g>/g)q  divides  q'  =  (rh1 /m)q,  we  have 

(g'/g)u  =  ( g'/g)v'  =  (g'/g)  ■  n  +  e'^  (mod  (g'/g)qR'). 

Therefore,  letting  Tr  =  Tiy;/  m,  by  /i-lincarity  of  the  trace  and  Lemma  2. 1  we  have 

Tr  ((<//. g)u')  =  Tr  (g'/g)  •  ^  •  g  +  Tr(e'  •  g')/g 

P 

=  ~z~  ( -  ■  p  +  e\  (mod  q'R), 


m  \p 

where  e  =  (m/m')  Tr(e'  •  g') / g  G  R.  Therefore,  after  scaling  down  the  plaintext  modulus  q'  by  an  m! /m 
factor  (see  Section  2.2.2 1,  the  plaintext  is  |  •  g  +  e  G  Rq. 

Moreover,  e  •  g  =  (rh/m')  Tr(e'  •  g')  G  gR  is  short  because  e'  ■  g'  G  g'R'  is  short;  see,  e.g.,  (GHPST2,, 
Corollary  2.2].  In  fact,  by  basic  properties  of  the  decoding/decryption  basis  (as  defined  in  1ILPR 1 31)  under 
the  trace,  the  coefficient  vector  of  e  with  respect  to  the  decryption  basis  of  R  is  merely  a  subvector  of  the 
coefficient  vector  of  e!  with  respect  to  the  decryption  basis  of  R'.  Therefore,  e  is  within  the  error  tolerance  of 
the  rounding  function  on  Rq,  assuming  e!  is  within  the  error  tolerance  of  the  rounding  function  on  R’q. 


4.2  Applying  the  Trace 


Now  we  show  how  to  efficiently  homomorphically  apply  the  scaled  trace  function  (m/m')  Tiy,-///,.  to  an 
encryption  of  any  plaintext  in  R'q,  that  is  divisible  by  (g'/g).  Note  that  this  condition  ensures  that  the  output 
of  the  trace  is  a  multiple  of  m/m!  in  Rqr  (see  Lemma  2.1 1,  making  the  scaling  a  well-defined  operation  that 
results  in  an  element  of  Rn 


Lg. 


Lirst  recall  that  Tiyt./  //(,  is  the  sum  of  all  p(m') / \p(m)  automorphisms  of  R! / R,  i.e.,  automorphisms 
of  R'  that  fix  II  pointwise.  Therefore,  one  way  of  homomorphically  computing  the  scaled  trace  is  to 
homomorphically  apply  the  proper  automorphisms,  sum  the  results,  and  scale  down  the  plaintext  and  its 
modulus.  While  this  “sum-automorphisms”  procedure  yields  the  correct  result,  computing  the  trace  in  this  way 
does  not  run  in  quasilinear  time,  unless  the  number  p(ni') /;p(m)  of  automorphisms  is  only  poly  logarithmic. 
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Instead,  we  consider  a  sufficiently  tine-grained  tower  of  cyclotomic  rings 

^(r)/.../jR(1)/jR(0)j 

where  R'  =  R^r\  R  =  R^°\  and  each  =  0„H ,  where  m,  is  divisible  by  mj_i  for  z  >  0;  for  the 
finest  granularity  we  would  choose  the  tower  so  that  every  mj/mj-i  is  prime.  Notice  that  the  scaled  trace 
function  ( m/m ')  Tr R//R  is  the  composition  of  the  scaled  trace  functions  {rhi-i/rhi)  Tr^q) and  that 
g' / g  is  the  product  of  all  gbl/gb-1)  for  i  =  1, . . . ,  r,  where  gb)  E  R^  is  as  defined  in  Equation  (2.1 1.  So, 


another  way  of  homomorphically  applying  the  full  scaled  trace  is  to  apply  the  corresponding  scaled  trace 
in  sequence  for  each  level  of  the  tower,  “climbing  down”  from  R!  =  R<r)  to  R  =  R(ty^ .  In  particular,  if 
we  use  the  above  sum-automorphisms  procedure  with  a  tower  of  finest  granularity,  then  there  are  at  most 
log 2(m'/m)  =  0(log  A)  levels,  and  since  we  have  assumed  that  every  prime  divisor  of  m!  jm  is  bounded  by 
polylogarithmic  in  the  security  parameter  A,  the  full  procedure  will  run  in  quasilinear  0( A)  time. 

For  technical  reasons  related  to  the  analysis  of  noise  terms  under  automorphisms,  we  actually  use  the  sum- 
automorphisms  procedure  only  on  levels  RW/R^~R  =  OmJ OrrLi_1  of  the  tower  where  every  odd  prime 
dividing  m*  also  divides  m,_  i .  Otherwise,  we  instead  apply  the  scaled  trace  via  an  alternative  procedure 


using  ring-switching,  which  has  essentially  the  same  runtime  (see  Section  4.2.2  below  for  details).  In  fact, 
the  alternative  procedure  can  actually  be  used  for  any  level  of  the  tower,  but  it  has  the  slight  disadvantage  of 
requiring  the  index  of  the  ciphertext  ring  to  be  divisible  by  at  least  one  prime  that  does  not  divide  m,;  this  is 
why  we  prefer  not  to  use  it  when,  e.g.,  mi  is  a  power  of  two. 


4.2.1  Details  of  the  Sum-Automorphisms  Procedure 

Here  we  specify  the  procedure  for  homomorphically  applying  the  scaled  trace  by  summing  automorphisms,  as 
sketched  above.  Let  R! / R  =  Omi  / Om  be  a  cyclotomic  extension,  where  here  rri.  nn!  are  just  dummy  indices, 
not  necessarily  the  ones  from  above.  As  already  mentioned,  we  require  that  every  odd  prime  dividing  m! 
also  divides  m.  The  procedure  takes  as  input  a  ciphertext  c"  over  some  R"  D  R!  that  encrypts  a  plaintext 
w'  E  R! ,  under  secret  key  s"  E  R",  where  q'  =  {rh' /m)q  and  w'  is  divisible  by  (g'/g).  It  proceeds  as 
follows: 

1.  Compute  ciphertexts  Tj(c")  over  R"  for  a  certain  set  of  automorphisms  rt  of  R" / R  that  induce  the 
automorphisms  of  R! / R.  These  ciphertexts  will  respectively  encrypt  Ti(w')  E  R'q,  under  secret 
key  Tj(s").  Then  key-switch  HBVllal [BGV12I  these  to  ciphertexts  cb)  encrypting  Ti{w')  under  a 
common  secret  key  s.  See  below  for  further  details. 

2.  Sum  the  ciphertexts  c®  (component-wise)  to  get  a  new  ciphertext  c  that  encrypts  (under  secret  key  s) 
the  plaintext  Tr^/ /R(w')  =  Ti(w')  E  Rq>,  which  is  divisible  by  rh' /m. 


3.  Using  the  procedure  from  Section[2.2.2[  reduce  the  plaintext  modulus  to  q,  resulting  in  a  ciphertext 
that  encrypts  the  scaled  trace  (m/m')  TrR/ /R(w')  E  Rq  under  s. 

The  cori  ectness  of  Steps  [2] and [3] is  immediate,  so  we  just  need  to  give  the  details  of  Step[l]  We  need  to 
choose  automorphisms  r,  of  R"  / R  that  induce  the  automorphisms  of  R!  / R.  Recall  that  the  latter  are  defined 
by  Tj(Cm')  =  Cm'  for  a11  3  E  such  that  j  =  1  (mod  m).  For  each  such  j,  we  choose  an  i  E  Z *n„  such 
that  i  =  j  (mod  in')  and  such  that  i  is  1  modulo  every  prime  p  that  divides  rn"  but  not  m'\  this  is  possible 
by  the  Chinese  Remainder  Theorem.  Then  =  Cm"  's  an  automorphism  of  R" / R  that  induces  r?, 

because  i  =  1  (mod  m)  and 


T.(C  ,)  =  Rm  /m>i  = 

i\Sm  )  S m "  ^m! ' 
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Also,  by  our  assumption  on  m,  m' ,  each  i  we  use  is  1  modulo  every  prime  that  divides  m" ,  because  every 
such  prime  either  divides  m,  or  does  not  divide  m' ,  or  is  2. 

To  complete  the  details  of  Step JTJ  we  need  to  show  why  the  ciphertext  r^c")  encrypts  Ti{w')  E  R'q,  under 
secret  key  Tj  {s").  This  follows  from  the  decryption  relation  for  c",  and  the  fact  that  r,  is  a  ring  homomorphism 
that  induces  an  automorphism  of  R'  and  fixes  Z  C  R  pointwise: 

Ti(cg)  +  Ti(d{)  ■  Ti(s ")  =  ^  •  Ti(ll)  +  Tj(e"), 

where  the  error  term  e"  E  R"  of  c"  is  such  that  e"  ■  g"  is  short  (under  the  canonical  embedding  of  R"). 

The  only  subtlety  is  that  we  need  n(e ")  ■  g"  to  be  short.  We  show  below  that  g"  =  n(g"),  from  which  it 
follows  that  Tj(e ")  ■  g"  =  Ti(e"  ■  g"),  which  is  short  because  the  automorphisms  of  R"  simply  permute  the 
coordinates  of  the  canonical  embedding,  and  hence  preserve  norms  (see,  e.g.,  1 1, PR  10:  Lemma  5.6]).  To  see 
that  g"  =  Ti(g"),  recall  that  i  E  Z)r(,;  is  1  modulo  every  prime  p  that  divides  m" .  Therefore,  r,  fixes  every  (p 
and  hence  also  fixes  g"  J^] 

Lastly,  we  briefly  analyze  the  efficiency  of  the  procedure.  Applying  automorphisms  to  the  ciphertext  ring 
elements  is  a  trivial  linear-time  operation  in  the  dimension,  when  the  element  is  represented  in  any  of  the 
structured  bases  we  consider  (and  also  in  the  so-called  “Chinese  remainder”  basis).  Similarly,  key-switching 
is  quasilinear  time  in  the  bit  length  of  the  ciphertext,  which  itself  is  quasilinear  in  our  context. 


4.2.2  Applying  the  Trace  via  Ring-Switching 


Here  we  describe  the  alternative  procedure  for  applying  the  scaled  trace,  which  uses  the  ring-switching 
technique  from  HGHPS12II  (see  Proposition|2.3|l.  Let  R' / R  =  Omi /Om  be  an  arbitrary  cyclotomic  extension, 
where  m,  ml  are  again  dummy  variables.  For  this  procedure,  we  require  that  the  ciphertext  ring  R"  =  Om"  D 
R'  be  such  that  m" /m'  is  coprime  with  m' ,  but  otherwise  we  can  choose  m"  however  we  like.  As  before,  the 
input  is  a  ciphertext  c"  over  R"  that  encrypts  a  plaintext  w'  E  R' ,,  where  w'  is  divisible  by  (g1  / g). 

The  main  idea  is  that  since  m'  and  m" /m'  are  coprime,  we  can  write  R"  =  R!  ®U  where  U  =  Omn  /„,/ 
and  the  tensor  product  is  over  the  largest  common  base  ring  Z.  Then  the  /i'-linear  function  Tiy,./  /R  is  induced 
by  the  ( R  <8>  C/)-linear  function  L :  ( R '  ®U)  (R<g>U)  defined  by  L(a'  <8>  u)  =  TVR/ /R (o')  ®  u  for  all 

a'  E  R',u  E  U.  So,  using  the  ring-switching  procedure  from  Proposition  |2. 3  [  we  can  homomorphic  ally 
evaluate  L  on  ciphertext  c",  yielding  an  encryption  of  Tiy,.- /R(w'),  and  then  scale  down  the  plaintext  and  its 
modulus  as  usual.  One  nice  fact  we  highlight  is  that  using  ring-switching  to  evaluate  the  function  Tr  /,»/ /R 
does  not  incur  any  multiplicative  increase  in  the  noise  rate,  only  a  small  additive  one  from  the  key-switching 
step.  This  is  because  the  factor  associated  with  the  function  Tiy,./  iR  that  is  applied  to  the  ciphertext  in  the 
ring-switching  procedure  is  simply  1. 

One  very  important  point  is  that  ring-switching  requires  ring-LWE  to  be  hard  over  the  target  ring 
Om"-m/m'  —  R®U,  so  its  dimension  must  be  sufficiently  large,  but  at  the  same  time  we  cannot  make  the 
dimension  of  R"  =  Om"  too  large,  for  efficiency  reasons.  Therefore,  we  only  use  the  procedure  when  m'/m 
is  small,  and  for  sufficiently  large  rn" .  Note  that  if  the  m"  associated  with  a  given  input  ciphertext  is  too 
small,  we  can  trivially  increase  it  by  embedding  into  a  larger  cyclotomic  ring. 


6If,  contrary  to  our  assumption,  m!  was  divisible  by  one  or  more  primes  that  did  not  divide  m,  then  the  error  term  Ti(e"  •  g") 
appearing  in  the  ciphertext  would  be  accompanied  by  a  factor  of  g” /Ti(g").  The  expansion  associated  with  this  term  can  be  bounded 
and  is  not  excessive,  but  it  depends  on  the  number  and  sizes  of  the  primes  dividing  m'  and  not  m.  By  contrast,  the  alternative 
procedure  described  in  Section  ‘ 


4.2.2 


incurs  no  multiplicative  increase  in  the  noise  rate. 
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5  Homomorphic  Ring  Rounding 


In  this  section  we  describe  how  to  efficiently  homomorphically  evaluate  the  “ring  rounding  function” 
Rn 


lp.  —r  Rp,  where  R  =  Om  is  the  mth  cyclotomic  ring.  Conceptually,  we  follow  the  high-level 
strategy  from  HGHS12al.  but  instantiate  it  with  very  different  technical  components.  Recall  from  Section  [2.2. 1 


lj ' 


that  the  rounding  function  expresses  its  input  u  in  the  “decryption”  Z-basis  B  =  {bj}  of  R,  as  u  =  ]TG 
for  Uj  £  Zq,  and  outputs  [u~]p  :=  \u.i~} P  '  bj  G  Rp-  Unlike  with  integer  rounding  from  7Lq  to  Zp,  it  is 

not  clear  whether  this  rounding  function  has  a  low-depth  arithmetic  formula  using  just  the  ring  operations 
of  R.  One  difficulty  is  that  there  are  an  exponentially  large  number  of  values  in  Rq  that  map  to  a  given 
value  in  Rp,  which  might  be  seen  as  evidence  that  a  corresponding  arithmetic  formula  must  have  large  depth. 
Fortunately,  we  show  how  to  circumvent  this  issue  by  using  an  additional  homomorphic  operation,  namely, 
an  enhancement  of  ring-switching.  In  short,  we  reduce  the  homomoiphic  evaluation  of  the  ring  rounding 
function  (from  Rq  to  II p)  very  simply  and  efficiently  to  that  of  several  parallel  (batched)  evaluations  of  the 
integer  rounding  function  (from  Z q  to  Z p). 


5.1  Overview 

Suppose  we  choose  some  cyclotomic  ring  S  =  On  having  a  mod-g  CRT  set  C  =  { Cj }  C  S  of  cardinality 

1^1 

exactly  \B\.  That  is,  we  have  a  ring  embedding  from  the  product  ring  Zj;  into  Sq,  given  by  u  i— >  Ylj  uj  '  cj- 
Note  that  the  choice  of  the  ring  S  is  at  our  convenience,  and  need  not  have  any  relationship  to  the  plaintext 
ring  Rq.  We  express  the  rounding  function  Rq  -X  Rp  as  a  sequence  of  three  steps: 

1.  Map  u  =  uj  '  bj  G  Rq  to  J2j  Uj  ■  Cj  £  Sq,  i.e.,  send  the  Zq-coefficients  of  u  (with  respect  to  the 
decryption  basis  B)  to  the  Zq-slots  of  Sq. 

2.  Batch-apply  the  integer  rounding  function  from  Zq  to  Zp  to  the  slot  values  Uj,  to  get  JU  \_Uj\p-  Cj  £  Sp. 

3.  Invert  the  map  from  the  first  step  to  obtain  LU1  p  —  H2j  \.uj\p  '  bj  G  Rp- 

Using  batch/SIMD  operations  HSV11L  the  second  step  is  easily  achieved  using  the  fact  that  Sq  embeds  the 
product  of  several  copies  of  the  ring  Zq,  via  the  CRT  elements  Cj.  That  is,  we  can  simultaneously  round  all 
the  coefficients  Uj  to  Zp,  using  just  one  evaluation  of  an  arithmetic  procedure  over  S  corresponding  to  one 
for  the  integer  rounding  function  from  Zq  to  Zp. 

We  now  describe  one  way  of  expressing  the  first  and  third  steps  above,  in  terms  of  operations  that  can 
be  evaluated  homomorphically.  The  first  simple  observation  is  that  the  function  mapping  u  =  JU  Uj  ■  bj  to 
Y2j  uj  '  cj  is  induced  by  a  Z-linear  function  L :  R  S.  Specifically,  L  simply  maps  each  Z-basis  element  bj 
to  Cj.  Now  suppose  that  we  choose  S  so  that  its  largest  common  subring  with  R  is  Z,  i.e.,  the  indices  m,  l  are 
coprime.  Then  letting  T  =  R  +  S  =  Om£  =  R  (g>  S  be  the  compositum  ring,  Lemma [T2] yields  an  S'-linear 
function  L :  T  — >  S  that  coincides  with  L  on  R  C  T.  and  in  particular  on  u.  The  ring-switching  procedure 
from  Proposition [23] can  homomorphically  evaluate  any  S'-linear  function  from  T  to  S,  and  in  particular,  the 
function  L.  Therefore,  by  simply  embedding  R  into  T,  we  can  homomorphically  evaluate  L(x)  =  L(x)  by 
applying  the  ring-switching  procedure  with  L. 

Unfortunately,  there  is  a  major  problem  with  the  efficiency  of  the  above  approach:  the  dimension  (over  Z) 
of  the  compositum  ring  T  is  the  product  of  those  of  R  and  S,  which  are  each  at  least  linear  in  the  security 
parameter.  Therefore,  representing  and  operating  on  arbitrary  elements  in  T  requires  at  least  quadratic  time. 
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5.1.1  Efficiently  Mapping  from  B  to  C 


In  hindsight,  the  quadratic  runtime  of  the  above  approach  should  not  be  a  surprise,  because  we  treated 
Z :  R  —t  S  as  an  arbitrary  Z-linear  transformation,  and  B.  C  as  arbitrary  sets.  To  do  better,  Z,  B,  and  C 
must  have  some  structure  we  can  exploit.  Fortunately,  they  can — if  we  choose  them  carefully.  We  now 
describe  a  way  of  expressing  the  first  and  third  steps  above  in  terms  of  simple  operations  that  can  be  evaluated 
homomorphic  ally  in  quasilinear  time. 

The  main  idea  is  as  follows,  and  is  summarized  in  Figure  [T]  Instead  of  mapping  directly  from  R  to  S, 
we  will  express  Z  as  a  sequence  of  linear  transformations  L\, . . . ,  Lr  through  several  “hybrid”  cyclotomic 
rings  R  =  iZ°) .  /7(  1 Z . . . ,  HAr>  =  S.  For  sets  B  and  C  with  an  appropriate  product  structure,  these 
transformations  will  respectively  map  Aq  =  B  C  iZ°)  to  some  structured  subset  A\  C  II ■ 1 Z  then  A\  to 
some  structured  subset  A2  C  III2K  and  so  on,  finally  mapping  Ar_  1  to  Ar  =  C  C  IEr> .  In  contrast  to 
the  inefficient  method  described  above,  the  hybrid  rings  will  be  chosen  so  that  each  compositum  = 
FZ*_1)  +  ifW  of  adjacent  rings  has  dimension  just  slightly  larger  (by  only  a  polylogarithmic  factor)  than  that 
of  R.  This  is  achieved  by  choosing  the  indices  of  .  H(lI  to  have  large  greatest  common  divisor,  and 

hence  small  least  common  multiple.  For  example,  the  indices  can  share  almost  all  the  same  prime  divisors 
(with  multiplicity),  and  have  just  one  different  prime  divisor  each.  Of  course,  other  tradeoffs  between  the 
number  of  hybrid  rings  and  the  dimensions  of  the  compositums  are  also  possible. 

The  flip  side  of  this  approach  is  that  using  ring-switching,  we  can  homomorphic  ally  evaluate  only  FJ1'1- 
linear  functions  Zj :  ZZ*-1)  — >  ifW,  where  =  ZZ*-1)  n  H W  is  the  largest  common  subring  of  adjacent 
hybrid  rings.  Since  each  iZ*)  is  large  by  design  (to  keep  the  compositum  T*'-1  small),  this  requirement  is 
quite  strict,  yet  we  still  need  to  construct  linear  functions  Z,  that  sequentially  map  B  =  Ao  to  C  =  Ar.  To 
achieve  this,  we  construct  all  the  sets  A,  to  have  appropriate  product  structure.  Specifically,  we  ensure  that 
for  each  i  =  1 , . . . ,  r,  we  have  factorizations 

Ai_1  =  A™fl-Zi,  Ai  =  Af  ■  Zi  (5.1) 

for  some  set  Zi  C  E!l) ,  where  both  A°fx  and  A™  are  linearly  independent  over  Eilf  (Note  that  for  1  <  i  <  r, 
each  At  needs  to  factor  in  two  ways  over  two  subrings  ¥y1^  1 J  and  E{r\  which  is  why  we  need  two  sets  .4' 11 
and  .4‘>Llt.)  Then,  we  simply  define  Z,  to  be  an  arbitrary  -linear  function  that  bijectively  maps  .4™l(  to  ,4'n. 
(Note  that  A°-f{  and  ri'n  have  the  same  cardinality,  because  .4,;_  1  and  A,  do.)  It  immediately  follows  that  Z, 
bijectively  maps  ri,_  1  to  .4,,  because 

Zi(A- 1)  =  U{At 1  •  Zi)  =  Li(A™\)  ■  Zi  =  A?  •  Zi 


by  E^>  -linearity  and  the  fact  that  Z,r  c  E^\ 

Summarizing  the  above  discusion,  we  have  the  following  theorem. 

Theorem  5.1.  Suppose  there  exists  a  sequence  of  cyclotomic  rings  R  =  II{(]K  hEA  \  . . . ,  Htr!  =  S  and 
sets  Ai  C  //*Z  such  that  for  all  i  =  1, . . . ,  r,  we  have  A,;_  1  =  A°f 1  ■  Zi  and  Ai  =  A1-"  ■  Zifor  some  sets 
Zi  C  E^  =  ZZ^-1)  n  H Z  and  A°f 1,  A"'  that  are  each  E^ -linearly  independent  and  of  equal  cardinality. 
Then  there  is  a  sequence  of  E^ -linear  maps  Li :  //Z  l  >  — >  [{{l\  for  i  =  1, . . . ,  r,  whose  composition 
Lr  o  ■  ■  ■  o  L\  bijectively  maps  Aq  to  Ar. 

5.1.2  Applying  the  Map  Homomorphically 

So  far  we  have  described  how  our  desired  map  between  plaintext  rings  R  and  S  can  be  expressed  as  a 
sequence  of  linear  maps  through  hybrid  plaintext  rings.  In  the  context  of  bootstrapping,  for  security  these 
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T (!)  tW 


Figure  1:  An  example  mapping  from  B  C  R  to  C  C  S,  via  a  sequence  of  hybrid  rings.  Each  FJ  '>  = 
n  ifW  is  a  largest  common  subring,  and  each  =  Hi  ^  +  HU  is  a  compositum,  of  adjacent 
hybrid  rings.  For  any  -linear  function  from  Hl~  1 '  to  Hl\  there  is  a  corresponding  //'  '-'-linear  function 
from  TW  to  Hl>  that  coincides  with  it  on  Hl~l)  (see  Lemma  ! 


2.2). 


plaintext  rings  typically  need  to  be  embedded  in  some  larger  ciphertext  rings,  because  the  dimensions  of 
R,  S  are  not  large  enough  to  securely  support  the  very  small  noise  used  in  bootstrapping.  For  example, 
following  Step[2]of  our  bootstrapping  procedure  (Section[3]),  we  have  a  ciphertext  over  the  ring  R"  where 
R"  =  Om"  D  R  for  some  rn"  of  our  choice  that  is  divisible  by  m.  We  need  to  choose  the  sequence  of  hybrid 
ciphertext  rings  so  that  they  admit  linear  functions  (over  the  respective  largest  common  subrings)  that  induce 
the  desired  ones  on  the  underlying  plaintext  rings.  This  turns  out  to  be  very  easy  to  do,  as  we  now  explain. 

Let  H,  H’  be  adjacent  hybrid  plaintext  rings  having  largest  common  subring  E  =  HnH'  and  compositum 
T  =  H  +  //'.  Then  we  want  the  corresponding  ciphertext  rings  to  be  II  =  II  ®  U,  H'  =  II'  ®  U'  for  some 
cyclotomic  rings  U,  U' ,  and  the  largest  common  subring  and  compositum  of  H,  H'  to  be  E  =  E  <S>  (U  n  U') 
and  T  =  T  <g>  (U  +  U'),  respectively  (where  all  the  tensor  products  are  over  the  common  base  ring  Z). 
Then  any  L-l incar  function  L:  H  — >  II'  is  induced  by  any  L-l incar  function  L:  H  — »•  H'  satisfying 
L(h  ®  1)  =  L(h)  ®  1,  which  is  the  function  we  actually  apply  when  switching  between  ciphertext  rings. 

To  satisfy  the  above  conditions,  it  is  sufficient  (and  in  fact  necessary)  to  choose  the  respective  indices  u,  u' 
of  U,  U'  so  that  lcm(u.  u')  is  coprime  with  lcm(/i,  h'),  where  h,  h!  are  the  respective  indices  of  H,  H' .  Then 
the  ciphertext  rings  H ,  H'  have  indices  hu  and  h'u' ,  and  their  compositum  has  index  lcm(/t.  hi)  ■  lcm (u,  u'), 
which  must  be  quasilinear  for  asymptotic  efficiency.  In  typical  instantiations,  in  order  to  get  enough  additional 
slots  in  each  successive  ring,  hi /h  will  be  moderately  large  and  lcm(/q  hi)  ~  hi .  So  to  ensure  that  all  the 
ciphertext  rings  are  about  the  same  size,  we  can  choose  u/u'  ~  h' /h  and  lcm  (a,  vl  )  «  u. 

5.1.3  Mapping  Selected  Coefficients 

In  some  settings  we  may  only  need  to  map  certain  coefficients  into  slots,  i.e.,  map  a  particular  portion  of  B  to 
a  CRT  set  of  appropriate  size.  Lor  example,  when  bootstrapping  a  semi-packed  ciphertext  over  R'  =  Orn> 
with  plaintext  over  R  =  Ofn,  we  may  need  to  artificially  expand  our  view  of  the  plaintext  ring  to  some 
R  =  Orn,  so  that  m  is  coprime  with  ml  jm  (see  the  constraints  listed  at  the  start  of  Section  [3).  In  such  a 
case,  the  decryption  basis  B  of  R  factors  as  B  =  B'  ■  B,  where  B  is  the  decryption  basis  of  R  and  B'  C  R 
is  a  particular  //-basis  of  R.  Since  the  true  message  is  really  only  over  R,  it  can  be  shown  that  the  only 
coefficients  we  need  to  recover  the  message  are  associated  with  the  subset  b'  •  B  C  B  for  a  particular  fixed 
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b'  E  B' .  Therefore,  when  designing  the  hybrid  rings  and  CRT  sets  we  only  need  \B\  slots  in  total.  When 
initially  switching  from  R  through  the  hybrid  rings,  we  do  so  in  a  way  that  maps  b'  to  one  entry  of  a  CRT  set 
and  all  the  other  elements  of  B'  to  zero,  then  continue  by  mapping  all  of  B  to  a  CRT  set  as  usual.  Note  that 
we  still  need  to  go  through  just  as  many  hybrid  rings  to  map  from  R  to  S,  but  the  size  of  S  can  be  significantly 
smaller  because  it  needs  fewer  CRT  slots. 


5.2  Construction 

By  Theorem [54] and  the  ring-switching  procedure,  in  order  to  map  B  C  R  to  a  CRT  set  C  of  some  ring  S 
of  our  choice  in  a  way  that  can  be  efficiently  evaluated  homomorphically,  we  just  need  to  construct  hybrid 
cyclotomic  rings  R  =  H^\  . . . ,  H ^  =  S  and  sets  Ai  C  H^>  (where  Aq  =  B  and  Ar  =  C )  to  satisfy 

the  following  two  properties  for  each  i  =  1 , ,r: 

1.  Each  compositum  TW  =  Tfb-1)  +  ijW  is  not  too  large,  i.e.,  its  dimension  is  quasilinear. 

2.  The  sets  Aj_i,  A,  factor  as  described  in  Equation  (|5.1|). 

The  remainder  of  this  subsection  is  dedicated  to  providing  such  a  construction. 


5.2.1  Decomposition  of  R  and  Basis  B  C  R 

For  our  given  ring  R  =  Orn  and  its  Z-basis  B  used  in  decryption,  we  consider  a  tower  of  cyclotomic  rings 

rB)/rB~B/.../rA)/RW^ 

where  R^  =  R  and  R =  0\  =  Z,  and  each  R^  has  index  m;,  which  is  divisible  by  rrii-i  for  i  >  0.  For 
example,  in  a  finest-grained  decomposition,  r  is  the  number  of  prime  divisors  (with  multiplicity)  of  m,  and 
the  ratios  rrii/rrii-i  are  all  these  prime  divisors  in  some  arbitrary  order.  A  coarser-grained  decomposition 
may  be  used  as  well,  but  will  tend  to  make  the  compositum  rings  TW  larger. 

We  need  Z-bases  Bi  of  the  rings  that  have  a  product  structure  induced  by  the  tower.  Specifically,  for 
each  i  =  1 , . . . ,  r  we  need  to  have  the  factorization 


Bi  =  B[  ■  B,_i  c 


(5.2) 


for  some  set  B[  C  R^  that  is  linearly  independent  over  Rl>'~l) .  We  also  need  the  basis  /i!r)  of  R  =  R<r> 
to  be  the  one  used  for  decryption.  As  shown  in  Section  2.1.4[  the  scaled-up  “decoding”  basis  of  R  has  a 
finest-possible  factorization,  so  we  can  use  it  as  B  for  any  choice  of  the  tower. 

We  mention  that  the  power  basis  {1 ,  Cm>  Cm>  ■  ■  •  >  Cm  ™'1  X}  of  R,  which  is  implicitly  the  one  used  when 
representing  R  as  the  polynomial  ring  ZW]/(I>m(X),  does  not  have  the  required  product  structure  when  m 
is  divisible  by  two  or  more  odd  primes,  but  that  it  does  coincide  with  the  scaled-up  decoding  basis  when  m  is 
a  power  of  2.  (See  HLPR13H  for  details.) 


5.2.2  Ring  S  and  CRT  Set  C  C  S. 


We  next  design  S  =  On  so  that  it  also  yields  a  tower  of  cyclotomic  rings 
S(r)  —  5  ancj  g(0)  _  ^  ancj  g(i)  ^as  jncjex  as  described  in 
structured  mod-g  CRT  sets  Ci  of  S^l)  that  factor  as 


g(r)/g(r-i)/  .  /g(i)/g(Q),  where 


Sections  2.1.3  and  2.1.4|  there  are 


Ci  =  C'r  Ci-!, 
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where  C\  C  S W  is  an  S 1  Mincarly  independent  set  whose  cardinality  is  the  “relative  splitting  number” 
of  p  in  S'W/5'^-1),  i.e.,  the  number  of  distinct  prime  ideals  in  S W  lying  over  any  prime  ideal  divisor  of  p 
in  S'-'  i:'. 

We  need  to  choose  the  ring  S  and  its  tower  so  that  for  all  i  =  1, . . . ,  r, 

•  the  respective  indices  mr-i+ 1,  ii  of  R(r~i+1\SM  are  coprime  (certainly  it  suffices  for  m  and  i  to  be 
coprime,  but  this  is  not  always  necessary); 

•  the  dimension  <p(mr_j+i  •  lt )  is  not  too  large  (specifically,  it  is  quasi-linear  in  the  security  parameter); 

•  the  relative  splitting  number  \C[\  >  \B'r_i+l\. 

We  can  then  easily  define  structured  CRT  sets  Ci  C  C)  C  S{C  of  the  appropriate  cardinality,  and  in 
particular  C  =  Cr,  as  follows.  Define  Co  =  {1}  C  Z  =  S^°\  Then  for  each  i  =  1, . . . ,  r,  let  C[  C  C[  be  an 
arbitrary  subset  having  cardinality  exactly  \B’r_i+l\,  and  define 

Ci  =  C'i-  Ci- 1  C  Ci.  (5.3) 

5.2.3  Hybrid  Rings  TfW  and  Sets  Ai  C 

Informally,  with  each  successive  hybrid  ring  we  remove  another  level  from  the  /i- tower  and  add  on  another 
level  to  the  , S'- tower,  and  similarly  with  the  corresponding  components  of  the  structured  sets  B  and  C. 
Formally,  for  i  =  0, 1, . . . ,  r  we  define 

H^=Omr_iei=R^~i)^S^,  (5.4) 

Ai  =  Br-i  ■  Ci  C 

where  the  tensor  product  in  Equation  (|5.4[)  applies  to  the  rings  as  extensions  of  Z,  and  the  isomorphism  holds 
because  gcd (mr-i,ii)  <  gcd(mr_j+i,  ii)  =  1  by  design.  Note  that  fF(0)  =  Omr  =  R,  =  Oir  =  S, 
and  Aq  =  Br  =  B,  Ar  =  Cr  =  C,  as  required. 

For  each  i  =  1, . . . ,  r,  because  mr-l+  \  and  C  are  coprime,  it  is  straightforward  to  verify  that  the  largest 
common  subring  E®  =  n  and  compositum  T^>  =  are 

=  Omr_i+lii  **  R^+V  <g) 

where  the  tensor  products  above  are  over  the  common  base  ring  Z.  Note  that  the  dimension  of  TW  /Z  is 
<p(mr-i+ 1  •  ii),  which  is  quasi-linear  in  the  security  parameter  by  construction. 

Lemma  5.2.  The  sets  Ai-\,  Ai  factor  as  in  Equation  (|5. 1  [),  i.e.,  A,;_  |  =  A°lfx  ■  Zi  and  Ai  =  A"1  ■  Zifor 
some  sets  Zi  C  and  A°lfl,A"'  that  are  each  -linearly  independent  and  of  equal  cardinality. 

Proof.  Define  Zi  =  •  Q_i  C  E{l'h  Recall  from  Equation  (|5.2[)  that  Br-i+\  =  B'r_i+l  ■  Br-i, 

where  B'r_i+l  C  RC~l+P  is  linearly  independent  over  R.^r~l'>  c  and  hence  also  over  E M  = 

rC~1)  (g)  (because  it  corresponds  to  the  set  of  pure  tensors  B’r_i+l  ®  {1}  C  R(r~l+1~>  tg>  S'^-1)).  Then 

Ai- 1  =  {Br_i+ 1  ■  Br-i )  •  Ci— i  =  Br_i_ )_1  •  Zi 

is  the  desired  factorization.  Similarly,  recall  from  Definition  (|5.3[)  that  Ct  =  C[  ■  Q_i,  where  C[  C  C[  C  S ^ 
is  linearly  independent  over  S and  hence  also  over  El/lh  Then  we  have  the  desired  factorization 

^  =  Br-i  •  (Cl  •  Ci- 1)  =  C\  ■  Zi. 

Finally,  we  have  (A^J  =  \B'r_i+l\  =  |C'|  =  |A‘n|  by  design  of  C[.  □ 
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A  Transformation  Between  LSB  and  MSB  Encodings 


Here  we  describe  a  folklore  transformation  between  the  “least  significant  bit”  and  “most  significant  bit” 
message  encodings  for  (ring-)LWE-based  cryptosystems. 

Let  plaintext  modulus  p  and  ciphertext  modulus  q  be  coprime,  fix  integers  cp,  cq  such  that  cpp  +  cqq  =  1, 
and  observe  that  cp  =  p~x  (mod  q)  and  cq  =  q~l  (mod  p). 


•  An  lsb  encoding  of  a  value  p  6  Zp  is  any  v  G  Zf/  such  that  v  =  e  (mod  q)  for  some  integer 
e  e  [—q/ 2,  q/2)  where  e  =  p  (mod  p). 

•  An  msb  encoding  of  p  is  any  w  e  7Lq  such  that  \w\  :=  [w  ■  (p/q)]  =  p  (mod  p). 

Ifu  E  Z  q  is  an  lsb  encoding  of  p  e  Zp,  then  we  claim  that  p_1  •  v  6  7Lq  is  an  msb  encoding  of 
— q~1  ■  p  e  Zp.  Indeed,  since  v  =  e  (mod  </)  for  some  e  E  {p  +  pZ)  n  [— q/2,  q/2),  we  have 


Lp’1 


1~cgg  „  p 
p  q 


—cq  ■  e  =  —q  1  •  p  (mod  p). 


In  the  other  direction,  if  w  €  Zg  is  an  msb  encoding  of//  6  Zp,  then  we  claim  that  p-w  is  an  lsb  encoding 
of  —q  ■  p  e  Zp.  Indeed,  by  assumption  we  have 


LHP  =  •  (p/q)  1  =  m  •  (p/g)  -  f  =  p  (mod  p) 

for  some  /  6  ^Zfl  [ — 1  /2,  1/2).  Multiplying  by  q  and  letting  e  =  q  ■  f  E  Z  n  [— g/2,  g/2),  we  have 

p  ■  w  —  e  =  q  ■  p  (mod  pq) . 


Reducing  this  modulo  q,  we  get  p  ■  w  =  e  (mod  q),  and  reducing  it  modulo  p,  we  have  e  =  —q  ■  p  (mod  p). 

The  above  facts  make  it  possible  to  convert  between  lsb  and  msb  representations  of  (ring-)LWE  ciphertexts, 
simply  by  multiplying  the  ciphertext  by  p  or  p  1  modulo  q.  This  works  because  decryption  recovers  a  7Lq- 
encoding  of  the  message  simply  as  a  linear  function  of  the  ciphertext,  so  the  p  or  p_1  factor  simply  “passes 
through”  the  ciphertext  to  the  encoding.  (In  the  ring  setting,  the  encoding  of  plaintext  ring  elements  is 
coefficient-wise  in  a  certain  basis,  so  the  same  reasoning  applies.)  If  q  =  —  1  (mod  p),  then  the  above 
transformations  preserve  the  message  exactly.  In  other  cases,  we  can  just  keep  track  of  the  factors  of  —q  or 
— q~l  introduced  by  the  conversions  (which  may  be  affected  by  other  homomorphic  operations),  and  remove 
them  upon  decryption. 


B  Integer  Rounding  Procedure 

Here  we  recall  (a  close  variant  of)  the  efficient  arithmetic  procedure  from  BGHS12a1  for  computing  the  “most 
significant  bit”  function  msb9 :  7Lq  -»  Z2  for  q  =  2£  >  4,  defined  as  msbg(x)  =  [x/(q/2)\.  Note  that 
the  integer  rounding  function  L-]2  :  Z2  is  simply  [x\2  =  msb,; (x  +  q/ 4).  The  multiplicative  depth 

and  cost  (in  number  of  operations)  of  the  msb,j  procedure  are  not  precisely  analyzed  in  liGHS  l  2 all,  and  the 
procedure  as  written  turns  out  to  be  suboptimal  in  depth  and  number  of  operations  by  log2((/)  factors,  because 
it  (homomorphically)  raises  ciphertexts  to  large  powers  in  an  inner  loop.  So  for  completeness,  here  we 
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present  a  simplified  and  optimized  version  of  the  procedure,  and  an  analysis  of  its  depth  and  cost.  It  uses  the 
standard  ring  operations  of  Z2j,  as  well  as  division  by  2  of  values  that  are  guaranteed  to  be  even.  All  of  these 
operations  can  be  evaluated  homomorphically  for  the  cryptosystem  described  in  Section  2.2  as  explained  in 
Section  [2. 2. 2|  The  procedure  also  easily  generalizes  to  any  prime  base. 


Algorithm  1  Arithmetic  procedure  for  computing  msbg 

Z9  ->  Z2  [GHS12a] 

Input:  Element  x  E  Zg,  where  q  =  2l  for  some  positive  integer  £ 

Output:  msbq(x)  E  Z2 

1 

Wo  <—  X 

//  Wo  E  Z  g 

2 

for  if-  1,. . . ,£  —  Ido 

3 

y  -t—  x 

II  y  E  Z q,  y  =  x  (mod  2*+1) 

4 

for  j  4—  0, . . . ,  i  —  1  do 

5 

Wj  <-  uj 

//  now  Wj  =  lsb(Lx/2-?J)  (mod  2l~l+1) 

6 

y  (y  —  Wj)/ 2  mod  (q/ 2-J+1) 

//now  y  E  Zg/2J+i,  y  =  [x/2j+1\  (mod  2l~j) 

7 

Wi^y 

II  Wi  E  Z q/2i,  Wi  =  L^/2*J  (mod  2) 

8 

return  w^-i  E  Z2 

Correctness  follows  from  HGHS12al  Lemma  2],  The  main  idea  is  that  when  initially  assigned,  each  Wj 
has  the  same  least-significant  bit  as  \_x/2^\,  i.e.,  Wj  =  [x/2j  (mod  2)  (but  its  other  bits  may  not  agree 
with  x’s).  Each  time  Wj  is  squared  in  Step [5j  its  least-significant  bit  remains  the  same,  but  an  additional 
more-significant  bit  is  set  to  zero.  That  is,  after  t  squarings,  Wj  =  lsb(  [x/2J  \ )  (mod  2,+1 ).  Therefore,  in 
iteration  i,  the  inner  loop  “shifts  away”  the  i  least-significant  bits  of  x,  leaving  the  (i  +  l)st  bit  intact  in  the 
least  significant  position  (but  possibly  changing  the  others),  at  which  point  we  can  assign  Wi  and  maintain  the 
invariant. 

We  now  briefly  analyze  the  homomorphic  evaluation  of  the  procedure,  in  terms  of  its  induced  noise 
growth  and  runtime  cost.  The  most  important  observation  is  that  although  it  is  written  using  a  doubly  nested 
loop,  the  procedure  actually  has  multiplicative  depth  exactly  £  —  1  =  log2(g/2).  This  is  because  in  the  inner 
loop,  each  Wj  for  j  =  0, . . . ,  i  —  1  can  be  squared  in  parallel  (Step [5]).  Each  squaring  of  the  plaintext  value 
wj  E  Z?/ 2j  induces  the  usual  small  polynomial  expansion  (q/2r)  ■  nc  (where  c  ~  1 )  in  the  noise  rate  of  the 
associated  ciphertext.  The  iterated  subtractions  and  divisions  by  2  (Step  [6])  cause  no  growth  at  all  in  the  noise 
rate:  each  subtraction  sums  (at  worst)  the  noise  rates  of  the  associated  ciphertexts,  and  division  by  2  halves 
the  noise  rate. 

In  the  ith  iteration,  the  procedure  performs  i  homomorphic  multiplications  and  i  subtractions  (and  also  i 
divisions  by  2,  but  these  are  trivial  as  homomorphic  operations).  Therefore,  the  procedure  uses  a  total  of 
£(£  —  l)/2  homomorphic  multiplications  and  subtractions  each. 

C  Concrete  Choices  of  Rings 

Here,  for  p  =  2  and  several  values  of  the  original  cyclotomic  index  m,  we  give  some  workable  values  for  the 
target  cyclotomic  index  £,  along  with  the  indices  of  the  intermediate  “hybrid”  rings,  the  dimensions  of  the 
compositum  rings,  etc.  Note  that  when  mr-i+\  =  2,  then  the  ring  Rr~i+ 1  has  dimension  1,  and  so  we  can 
move  directly  from  mr-i+i  =  2  to  mT_,.+ 1  =  1.  In  the  tables  below,  and  following  the  notation  in  Section[5] 

•  mr_j+ 1  is  the  index  of  the  ring  Rr~i+ 1  at  step 


25 


Approved  for  Public  Release;  Distribution  Unlimited. 
322 


•  ii  is  the  index  of  the  ring  Si  at  step  i; 

•  ip(mr-i+ 1  •  ii)  is  the  dimension  of  the  compositum  ring  at  step  i; 

•  l-Br-i+il 's  dimension  of  the  intermediate  ring  extension  /+-*+!) 

•  \C[\  is  the  “relative  splitting  number”  of  p  =  2  in  the  extension 

•  r  denotes  the  number  of  hybrid  rings  Rr-i+\,  i  £  [r] 


All  the  indices  are  lower  bounds  needed  to  support  the  functionality  of  the  ring-rounding  technique  on 
the  plaintext  space  (Section[5]).  Larger  ciphertext  indices  may  be  required  to  ensure  adequate  security  for  all 


the  homomorphic  operations;  see  Section  5.1.2 


Table  1:  Concrete  choices  for  mr  =  1024,  <p(mr)  =  512 


Step  i 

^r— 2+1 

ii 

1  1 
\'Dr— z+l  1 

+'| 

+mr j+i  •  ii) 

1 

1024 

17 

2 

2 

8192 

2 

512 

221  =  17-13 

4 

4 

49152 

3 

128 

1547  =  221-7 

4 

6 

73728 

4 

32 

7735  =  1547  •  5 

4 

4 

73728 

5 

8 

23205  =  7735  •  3 

2 

2 

36864 

6 

4 

69615  =  23205 • 3 

2 

3 

55296 

7 

1 

69615 

55296 

Table  2:  Concrete  choices  for  mr  =  512,  ip(mr )  =  256 


Step  i 

Wlr— 2+1 

ii 

1  1 

|-°r-2+ll 

+'| 

+mr i+i  •  ii) 

1 

512 

17 

2 

2 

4096 

2 

256 

221  =  17-13 

4 

4 

24576 

3 

64 

1547  =  221-7 

4 

6 

36864 

4 

16 

7735  =  1547  •  5 

4 

4 

36864 

5 

4 

23205  =  7735  •  3 

2 

2 

18432 

6 

1 

23205 

18432 
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Table  3:  Concrete  choices  for  mr  =  256,  <p(mr)  =  128 
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Abstract 

The  Short  Integer  Solution  (SIS)  and  Learning  With  Errors  (LWE)  problems  are  the  foundations  for 
countless  applications  in  lattice-based  cryptography,  and  are  provably  as  hard  as  approximate  lattice 
problems  in  the  worst  case.  A  important  question  from  both  a  practical  and  theoretical  perspective  is  how 
small  their  parameters  can  be  made,  while  preserving  their  hardness. 

We  prove  two  main  results  on  SIS  and  LWE  with  small  parameters.  For  SIS,  we  show  that  the  problem 
retains  its  hardness  for  moduli  q>  /3  ■  ns  for  any  constant  5  >  0,  where  (3  is  the  bound  on  the  Euclidean 
norm  of  the  solution.  This  improves  upon  prior  results  which  required  q>  (3  ■  y/n  log  n,  and  is  essentially 
optimal  since  the  problem  is  trivially  easy  for  q  <  j3.  For  LWE,  we  show  that  it  remains  hard  even  when 
the  errors  are  small  (e.g.,  uniformly  random  from  {0, 1}),  provided  that  the  number  of  samples  is  small 
enough  (e.g.,  linear  in  the  dimension  n  of  the  LWE  secret).  Prior  results  required  the  errors  to  have 
magnitude  at  least  y/n  and  to  come  from  a  Gaussian-like  distribution. 


1  Introduction 

In  modem  lattice-based  cryptography,  two  average-case  computational  problems  serve  as  the  foundation 
of  almost  all  cryptographic  schemes:  Short  Integer  Solution  (SIS),  and  Learning  With  Errors  (LWE).  The 
SIS  problem  dates  back  to  Ajtai’s  pioneering  work  [[Q,  and  is  defined  as  follows.  Let  n  and  q  be  integers, 
where  n  is  the  primary  security  parameter  and  usually  q  =  poly(n),  and  let  f3  >  0.  Given  a  uniformly 
random  matrix  A  e  Z”xm  for  some  m  =  poly(n),  the  goal  is  to  find  a  nonzero  integer  vector  z  £  Zm 
such  that  Az  =  0  mod  q  and  ||z||  <  (3  (where  ||-||  denotes  Euclidean  norm).  Observe  that  f3  should  be 
set  large  enough  to  ensure  that  a  solution  exists  (e.g.,  (3  >  y/ n  log  q  suffices),  but  that  (3  >  q  makes  the 
problem  trivially  easy  to  solve.  Ajtai  showed  that  for  appropriate  parameters,  SIS  enjoys  a  remarkable 
worst-case/average-case  hardness  property:  solving  it  on  the  average  (with  any  noticeable  probability)  is  at 
least  as  hard  as  approximating  several  lattice  problems  on  n-dimensional  lattices  in  the  worst  case,  to  within 
poly(n)  factors. 
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The  LWE  problem  was  introduced  in  the  celebrated  work  of  Regev  il24l.  and  has  the  same  parameters  n 
and  q,  along  with  a  “noise  rate”  a  E  (0, 1).  The  problem  (in  its  search  form)  is  to  find  a  secret  vector 
s  E  Z™,  given  a  “noisy”  random  linear  system  A  E  Z”xm,  b  =  A1  s  +  e  mod  q,  where  A  is  uniformly 
random  and  the  entries  of  e  are  i.i.d.  from  a  Gaussian-lfke  distribution  with  standard  deviation  roughly  aq. 
Regev  showed  that  as  long  as  aq  >  2y/n,  solving  LWE  on  the  average  (with  noticeable  probability)  is  at 
least  as  hard  as  approximating  lattice  problems  in  the  worst  case  to  within  ()(n/ a)  factors  using  a  quantum 
algorithm.  Subsequently,  Peikert  11211  gave  a  classical  reduction  for  a  subset  of  the  lattice  problems  and  the 
same  approximation  factors,  but  under  the  additional  condition  that  q  >  2”/ 2  (or  q  >  2y/n/a  based  on  some 
non-standard  lattice  problems). 

A  significant  line  of  research  has  been  devoted  to  improving  the  tightness  of  worst-case/average-case 
connections  for  lattice  problems.  For  SIS,  a  series  of  works  |0Q  |7j  14,  19j  IT2l  gave  progressively  better 
parameters  that  guarantee  hardness,  and  smaller  approximation  factors  for  the  underlying  lattice  problems. 
The  state  of  the  art  (from  lll2l.  building  upon  techniques  introduced  in  lfl9l)  shows  that  for  q  >  fj-uj(\/n  log  n), 
finding  a  SIS  solution  with  norm  bounded  by  /?  is  as  hard  as  approximating  worst-case  lattice  problems  to 
within  0(/3y/n)  factors.  (The  parameter  m  does  not  play  any  significant  role  in  the  hardness  results,  and 
can  be  any  polynomial  in  n.)  For  LWE,  Regev ’s  initial  result  remains  the  tightest,  and  the  requirement  that 
q  >  y/n/a  (i.e.,  that  the  errors  have  magnitude  at  least  y/n)  is  in  some  sense  optimal:  a  clever  algorithm 
due  to  Arora  and  Ge  J2|  solves  LWE  in  time  2°(nqr,  so  a  proof  of  hardness  for  substantially  smaller  errors 
would  imply  a  subexponential  time  (quantum)  algorithm  for  approximate  lattice  problems,  which  would  be  a 
major  breakthrough.  Interestingly,  the  current  modulus  bound  for  LWE  is  in  some  sense  better  than  the  one 
for  SIS  by  a  Q(y/n)  factor:  there  are  applications  of  LWE  for  1/a  =  0(1)  and  hence  q  =  0(y/n),  whereas 
SIS  is  only  useful  for  (5  >  y/n,  and  therefore  requires  q  >  n  according  to  the  state-of-the-art  reductions. 

Further  investigating  the  smallest  parameters  for  which  SIS  and  LWE  remain  provably  hard  is  important 
from  both  a  practical  and  theoretical  perspective.  On  the  practical  side,  improvements  would  lead  to 
smaller  cryptographic  keys  without  compromising  the  theoretical  security  guarantees,  or  may  provide  greater 
confidence  in  more  practical  parameter  settings  that  so  far  lack  provable  hardness.  Also,  proving  the  hardness 
of  LWE  for  non-Gaussian  error  distributions  (e.g.,  uniform  over  a  small  set)  would  make  applications  easier 
to  implement.  Theoretically,  improvements  may  eventually  shed  light  on  related  problems  like  Learning 
Parity  with  Noise  (LPN),  which  can  be  seen  as  a  special  case  of  LWE  for  modulus  q  =  2,  and  which  is 
widely  used  in  coding-based  cryptography,  but  which  has  no  known  proof  of  hardness. 

1.1  Our  Results 

We  prove  two  complementary  results  on  the  hardness  of  SIS  and  LWE  with  small  parameters.  For  SIS,  we 
show  that  the  problem  retains  its  hardness  for  moduli  q  nearly  equal  to  the  solution  bound  /3.  For  LWE,  we 
show  that  it  remains  hard  even  when  the  errors  are  small  (e.g.,  uniformly  random  from  {0, 1}),  provided  that 
the  number  m  of  noisy  equations  is  small  enough.  This  qualification  is  necessary  in  light  of  the  Arora-Ge 
attack  |[2j],  which  for  large  enough  rn  can  solve  LWE  with  binary  errors  in  polynomial  time.  Details  follow. 

SIS  with  small  modulus.  Our  first  theorem  says  that  SIS  retains  its  hardness  with  a  modulus  as  small  as 
q  >  f 3  ■  n'\  for  any  6  >  0.  Recall  that  the  best  previous  reduction  llT2l  required  q  >  (3  ■  t o(y/n logn),  and  that 
SIS  becomes  trivially  easy  for  q  <  (3,  so  the  q  obtained  by  our  proof  is  essentially  optimal.  It  also  essentially 
closes  the  gap  between  LWE  and  SIS,  in  terms  of  how  small  a  useful  modulus  can  be.  More  precisely,  the 
following  is  a  special  case  of  our  main  SIS  hardness  theorem;  see  Section[3]for  full  details. 
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Theorem  1.1  (Corollary  of  Theorem  |3.8[>.  Let  n  and  rri  =  poly(n)  be  integers,  let  (5  >  3oc  >  1  be  reals, 
let  Z  =  { z  €  Zm  :  ||z||2  <  3  and  ||z||  <  3oo\>  and  let  q>  f3  ■  ns  for  some  constant  5  >  0.  Then  solving 

(on  the  average,  with  non-negligible  probability )  SIS  with  parameters  n ,  m,  q  and  solution  set  Z  \  {0}  is 
at  least  as  hard  as  approximating  lattice  problems  in  the  worst  case  on  n-dimensional  lattices  to  within 
7  =  max{l,  3  ■  3oo/h}  ■  0(3s/n)  factors. 

Of  course,  the  bound  on  the  SIS  solutions  can  be  easily  removed  simply  setting  3oo  =  3,  so  that 
|| z || oo  <  || z || 2  <  3  automatically  holds  true.  We  include  an  explicit  (W  bound  3oo  <  3  in  order  to  obtain 
more  precise  hardness  results,  based  on  potentially  smaller  worst-case  approximation  factors  7.  We  point  out 
that  the  bound  3oo  and  the  associated  extra  term  max{l,  3  ■  3oo/q}  in  the  worst-case  approximation  factor 
is  not  present  in  previous  results.  Notice  that  this  term  can  be  as  small  as  1  (if  we  take  q  >  3  ■  3o o>  and  in 
particular  if  3oo  <  ns),  and  as  large  as  3 /rf  (if  3oo  =  3)-  This  may  be  seen  as  the  first  theoretical  evidence 
that,  at  least  when  using  a  small  modulus  q,  restricting  the  £ ^  norm  of  the  solutions  may  make  the  SIS 
problem  qualitatively  harder  than  just  restricting  the  £2  norm.  There  is  already  significant  empirical  evidence 
for  this  belief:  the  most  practically  efficient  attacks  on  SIS,  which  use  lattice  basis  reduction  (e.g.,  lfTTll8l). 
only  find  solutions  with  bounded  £2  norm,  whereas  combinatorial  attacks  such  as  GO  EH  (see  also  l|20il )  or 
theoretical  lattice  attacks  @  that  can  guarantee  an  £00  bound  are  much  more  costly  in  practice,  and  also 
require  exponential  space.  Finally,  we  mention  that  setting  3oo  -C  /I  is  very  natural  in  the  usual  formulations 
of  one-way  and  collision-resistant  hash  functions  based  on  SIS,  where  collisions  correspond  (for  example) 
to  vectors  in  {—1,  0,  l}m,  and  therefore  have  £^  bound  3oo  =  L  but  £2  bound  3  =  \A«.  Similar  gaps 
between  3oo  and  3  can  easily  be  enforced  in  other  applications,  e.g.,  digital  signatures  lfl2l. 


LWE  with  small  errors.  In  the  case  of  LWE,  we  prove  a  general  theorem  offering  a  trade-off  among 
several  different  parameters,  including  the  size  of  the  errors,  the  dimension  and  number  of  samples  in  the 
LWE  problem,  and  the  dimension  of  the  underlying  worst-case  lattice  problems.  Here  we  mention  just  one 
instantiation  for  the  case  of  prime  modulus  and  uniformly  distributed  binary  (i.e.,  0-1)  errors,  and  refer  the 
reader  to  Section[4]and  Theorem  |4.6|  tor  the  more  general  statement  and  a  discussion  of  the  parameters. 

Theorem  1.2  (Corollary  of  Theorem  |4.6{>.  Let  n  and  m  =  n  •  ( 1  +  <>  (I  /  logn))  be  integers,  and  q  >  n°':  1 
a  sufficiently  large  polynomially  bounded  (prime)  modulus.  Then  solving  LWE  with  parameters  n,  m,  q  and 
independent  uniformly  random  binary  errors  (i.e.,  in  {0, 1})  is  at  least  as  hard  as  approximating  lattice 
problems  in  the  worst  case  on  Q(n/  log  n)  -dimensional  lattices  within  a  factor  7  =  O  ( fn  ■  q). 


We  remark  that  our  results  (see  Theorem |4.6[)  apply  to  many  other  settings,  including  error  vectors  e  E  X 
chosen  from  any  (sufficiently  large)  subset  X  C  {0,  l}m  of  binary  strings,  as  well  as  error  vectors  with 
larger  entries.  Interestingly,  our  hardness  result  for  LWE  with  very  small  errors  relies  on  the  worst-case 
hardness  of  lattice  problems  in  dimension  n'  =  0(n/  logn),  which  is  smaller  than  (but  still  quasi-linear 
in)  the  dimension  n  of  the  LWE  problem;  however,  this  is  needed  only  when  considering  very  small  error 
vectors.  Theorem  4.6  also  shows  that  if  e  is  chosen  uniformly  at  random  with  entries  bounded  by  ne  (which 
is  still  much  smaller  than  \fn),  then  the  dimension  of  the  underlying  worst-case  lattice  problems  (and  the 
number  m  —  n  of  extra  samples,  beyond  the  LWE  dimension  n)  can  be  linear  in  n. 

The  restriction  that  the  number  of  LWE  samples  m  =  0(n)  be  linear  in  the  dimension  of  the  secret  can 
also  be  relaxed  slightly.  But  some  restriction  is  necessary,  because  LWE  with  small  errors  can  be  solved 
in  polynomial  time  when  given  an  arbitrarily  large  polynomial  number  of  samples.  We  focus  on  linear 
m  =  0(n)  because  this  is  enough  for  most  (but  not  all)  applications  in  lattice  cryptography,  including 
identity-based  encryption  and  fully  homomorphic  encryption,  when  the  parameters  are  set  appropriately. 
(The  one  exception  that  we  know  of  is  the  security  proof  for  pseudorandom  functions  01.) 
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1.2  Techniques  and  Comparison  to  Related  Work 

Our  results  for  SIS  and  LWE  are  technically  disjoint,  and  all  they  have  in  common  is  the  goal  of  proving 
hardness  results  for  smaller  values  of  the  parameters.  So,  we  describe  our  technical  contributions  in  the 
analysis  of  these  two  problems  separately. 

SIS  with  small  modulus.  For  SIS,  as  a  warm-up,  we  first  give  a  proof  for  a  special  case  of  the  problem 
where  the  input  is  restricted  to  vectors  of  a  special  form  (e.g.,  binary  vectors).  For  this  restricted  version  of 
SIS,  we  are  able  to  give  a  self-reduction  (from  SIS  to  SIS)  which  reduces  the  size  of  the  modulus.  So,  we  can 
rely  on  previous  worst-case  to  average-case  reductions  for  SIS  as  “black  boxes,”  resulting  in  an  extremely 
simple  proof.  However,  this  simple  self-reduction  has  some  drawbacks.  Beside  the  undesirable  restriction  on 
the  SIS  inputs,  our  the  reduction  is  rather  loose  with  respect  to  the  underlying  worst-case  lattice  approximation 
problem:  in  order  to  establish  the  hardness  of  SIS  with  small  moduli  q  (and  restricted  inputs),  one  needs 
to  assume  the  worst-case  hardness  of  lattice  problems  for  rather  large  polynomial  approximation  factors. 
(By  contrast,  previous  hardness  results  for  larger  moduli  ||d9l  [121  only  assumed  hardness  for  quasi-linear 
approximation  factors.)  We  address  both  drawbacks  by  giving  a  direct  reduction  from  worst-case  lattice 
problems  to  SIS  with  small  modulus.  This  is  our  main  SIS  result,  and  it  combines  ideas  from  previous 
work  liT9lll2l  with  two  new  technical  ingredients: 

•  All  previous  SIS  hardness  proofs  ifTl  171  fl4l  [T9l  fl2ll  solved  worst-case  lattice  problems  by  iteratively 
finding  (sets  of  linearly  independent)  lattice  vectors  of  shorter  and  shorter  length.  Our  first  new 
technical  ingredient  (inspired  by  the  pioneering  work  of  Regev  ll24ll  on  LWE)  is  the  use  a  different 
intermediate  problem:  instead  of  finding  progressively  shorter  lattice  vectors,  we  consider  the  problem 
of  sampling  lattice  vectors  according  to  Gaussian-like  distributions  of  progressively  smaller  widths. 
To  the  best  of  our  knowledge,  this  is  the  first  use  of  Gaussian  lattice  sampling  as  an  intermediate 
worst-case  problem  in  the  study  of  SIS,  and  it  appears  necessary  to  lower  the  SIS  modulus  below  n. 
We  mention  that  Gaussian  lattice  sampling  has  been  used  before  to  reduce  the  modulus  in  hardness 
reductions  for  SIS  Ifl2ll.  but  still  within  the  framework  of  iteratively  finding  short  vectors  (which  in  Itl2l 
are  used  to  generate  fresh  Gaussian  samples  for  the  reduction),  which  results  in  larger  moduli  q  >  n. 

•  The  use  of  Gaussian  lattice  sampling  as  an  intermediate  problem  within  the  SIS  hardness  proof  yields 
linear  combinations  of  several  discrete  Gaussian  samples  with  adversarially  chosen  coefficients.  Our 
second  technical  ingredient,  used  to  analyze  these  linear  combinations,  is  a  new  convolution  theorem 
for  discrete  Gaussians  (Theorem  |3. 3 1),  which  strengthens  similar  ones  previously  proved  in  ll22l  |6]]. 
Here  again,  the  strength  of  our  new  convolution  theorem  appears  necessary  to  obtain  hardness  results 
for  SIS  with  modulus  smaller  than  n. 

Our  new  convolution  theorem  may  be  of  independent  interest,  and  might  find  applications  in  the  analysis  of 
other  lattice  algorithms. 

LWE  with  small  errors.  We  now  move  to  our  results  on  LWE.  For  this  problem,  the  best  provably  hard 
parameters  to  date  were  those  obtained  in  the  original  paper  of  Regev  Il24l.  which  employed  Gaussian  errors, 
and  required  them  to  have  (expected)  magnitude  at  least  \Jn.  These  results  were  believed  to  be  optimal  due 
to  a  clever  algorithm  of  Arora  and  Ge  |[2j ,  which  solves  LWE  in  subexponential  time  when  the  errors  are 
asymptotically  smaller  than  y7n.  The  possibility  of  circumventing  this  barrier  by  limiting  the  number  of  LWE 
samples  was  first  suggested  by  Micciancio  and  Mol  ifTTTl.  who  gave  “sample  preserving”  search-to-decision 
reductions  for  LWE,  and  asked  if  LWE  with  small  uniform  errors  could  be  proved  hard  when  the  number 
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of  available  samples  is  sufficiently  small.  Our  results  provide  a  first  answer  to  this  question,  and  employ 
concepts  and  techniques  from  the  work  of  Peikert  and  Waters  il23ft  (see  also  |U)  on  lossy  (trapdoor)  functions. 
In  brief,  a  lossy  function  family  is  an  indistinguishable  pair  of  function  families  T .  C  such  that  functions  in 
T  are  injective  and  those  in  C  are  lossy,  in  the  sense  that  they  map  their  common  domain  to  much  smaller 
sets,  and  therefore  lose  information  about  the  input.  As  shown  in  ll23l.  from  the  indistinguishability  of  T  and 
C,  it  follows  that  the  families  T  and  C  are  both  one-way. 

In  Section  [2]  we  present  a  generalized  framework  for  the  study  of  lossy  function  families,  which  does  not 
require  the  functions  to  have  trapdoors,  and  applies  to  arbitrary  (not  necessarily  uniform)  input  distributions. 
While  the  techniques  we  use  are  all  standard,  and  our  definitions  are  minor  generalizations  of  the  ones  given 
in  ll23l.  we  believe  that  our  framework  provides  a  conceptual  simplification  of  previous  work,  relating  the 
relatively  new  notion  of  lossy  functions  to  the  classic  security  definitions  of  second-preimage  resistance  and 
uninvertibility. 

The  lossy  function  framework  is  used  to  prove  the  hardness  of  LWE  with  small  uniform  errors  and 
(necessarily)  a  small  number  of  samples.  Specifically,  we  use  the  standard  LWE  problem  (with  large 
Gaussian  errors)  to  set  up  a  lossy  function  family  T .  C.  (Similar  families  with  trapdoors  were  constructed 
in  mm,  but  not  for  the  parameterizations  required  to  obtain  interesting  hardness  results  for  LWE.)  The 
indistinguishability  of  T  and  C  follows  directly  from  the  hardness  of  the  underlying  LWE  problem.  The 
new  hardness  result  for  LWE  (with  small  errors)  is  equivalent  to  the  one-wayness  of  F,  and  is  proved  by 
a  relatively  standard  analysis  of  the  second-preimage  resistance  and  uninvertibility  of  certain  subset-sum 
functions  associated  to  £. 

Comparison  to  related  work.  In  an  independent  work  that  was  submitted  concurrently  with  ours,  Dottling 
and  Muller-Quade  m  also  used  a  lossyness  argument  to  prove  new  hardness  results  for  LWE.  (Their  work 
does  not  address  the  SIS  problem.)  At  a  syntactic  level,  they  use  LWE  (i.e.,  generating  matrix)  notation  and 
a  new  concept  they  call  “lossy  codes,”  while  here  we  use  SIS  (i.e.,  parity-check  matrix)  notation  and  rely 
on  the  standard  notions  of  uninvertible  and  second-preimage  resistant  functions.  By  the  dual  equivalence  of 
SIS  and  LWE  lfl5l[T7l  (see  Proposition  |2.9[),  this  can  be  considered  a  purely  syntactic  difference,  and  the 
high-level  lossyness  strategy  (including  the  lossy  function  family  construction)  used  in  IflOl  and  in  our  work 
are  essentially  the  same.  However,  the  low-level  analysis  techniques  and  final  results  are  quite  different.  The 
main  result  proved  in  IflOl  is  essentially  the  following. 

Theorem  1.3  llflOl).  Let  n,  q,  m  =  n°  W  and  r  >  n1//2+€  •  m  be  integers,  for  an  arbitrary  small  con¬ 
stant  e  >  0.  Then  the  LWE  problem  with  parameters  n,  rn.  q  and  independent  uniformly  distributed  errors  in 
{— r, . . . ,  r}m  is  at  least  as  hard  as  (quantumly)  solving  worst-case  problems  on  (n/ 2) -dimensioned  lattices 
to  within  a  factor  7  =  n1+e  •  mq/r. 

The  contribution  of  IflOl  over  previous  work  is  to  prove  the  hardness  of  LWE  for  uniformly  distributed 
errors,  as  opposed  to  errors  that  follow  a  Gaussian  distribution.  Notice  that  the  magnitude  of  the  errors  used 
in  iflOll  is  always  at  least  yfn  ■  m,  which  is  substantially  larger  (by  a  factor  of  m)  than  in  previous  results.  So, 
El  makes  no  progress  towards  reducing  the  magnitude  of  the  errors,  which  is  the  main  goal  of  this  paper. 
By  contrast,  our  work  shows  the  hardness  of  LWE  for  errors  smaller  than  yfn  (indeed,  as  small  as  {0, 1}), 
provided  the  number  of  samples  is  sufficiently  small. 

Like  our  work,  lITOll  requires  the  number  of  LWE  samples  m  to  be  fixed  in  advance  (because  the  error 
magnitude  r  depends  on  m),  but  it  allows  m  to  be  an  arbitrary  polynomial  in  n.  This  is  possible  because 
for  the  large  errors  r  F  yfn  considered  in  IflOl.  the  attack  of  |[2l  runs  in  at  least  exponential  time.  So,  in 
principle,  it  may  even  be  possible  (and  is  an  interesting  open  problem)  to  prove  the  hardness  of  LWE  with 

5 


Approved  for  Public  Release;  Distribution  Unlimited. 
329 


(large)  uniform  errors  as  in  iflOl.  but  for  an  unbounded  number  of  samples.  In  our  work,  hardness  of  LWE 
for  errors  smaller  than  \Jn  is  proved  for  a  much  smaller  number  of  samples  m,  and  this  is  necessary  in  order 
to  avoid  the  subexponential  time  attack  of  l2fl. 

While  the  focus  of  our  work  in  on  LWE  with  small  errors,  we  remark  that  our  main  LWE  hardness  result 
(Theorem|4.6|)  can  also  be  instantiated  using  large  polynomial  errors  r  =  n°( 1 1  to  obtain  any  (linear)  number 
of  samples  m  =  0(n).  In  this  setting,  [[Toll  provides  a  much  better  dependency  between  the  magnitude  of  the 
eiTors  and  the  number  of  samples  (which  in  IflOl  can  be  an  arbitrary  polynomial).  This  is  due  to  substantial 
differences  in  the  low-level  techniques  employed  in  IlIOl  and  in  our  work  to  analyze  the  statistical  properties 
of  the  lossy  function  family.  For  these  same  reasons,  even  for  large  errors,  our  results  seem  incomparable  to 
those  of  lilOll  because  we  allow  for  a  much  wider  class  of  error  distributions. 

2  Preliminaries 

We  use  uppercase  roman  letters  F,  X  for  sets,  lowercase  roman  for  set  elements  x  E  X,  bold  x  E  X" 
for  vectors,  and  calligraphic  letters  F,  A, . . .  for  probability  distributions.  The  support  of  a  probability 
distribution  X  is  denoted  [X].  The  uniform  distribution  over  a  finite  set  X  is  denoted  U(X). 

Two  probability  distributions  X  and  y  are  (t,  e) -indistinguishable  if  for  all  (probabilistic)  algorithms  V 
running  in  time  at  most  t, 

|Pr[a;  <—  X  :  V{x)  accepts]  —  Pr[y  t—  A  :  F>{y)  accepts])  <  e. 

2.1  One-Way  Functions 

A  function  family  is  a  probability  distribution  T  over  a  set  of  functions  F  C  (X  — >  Y)  with  common 
domain  X  and  range  Y.  Formally,  function  families  are  defined  as  distributions  over  bit  strings  (function 
descriptions)  together  with  an  evaluation  algorithm,  mapping  each  bitstring  to  a  corresponding  function,  with 
possibly  multiple  descriptions  associated  to  the  same  function.  In  this  paper,  for  notational  simplicity,  we 
identify  functions  and  their  description,  and  unless  stated  otherwise,  all  statements  about  function  families 
should  be  interpreted  as  referring  to  the  corresponding  probability  distributions  over  function  descriptions. 
For  example,  if  we  say  that  two  function  families  T  and  Q  are  indistinguishable,  we  mean  that  no  efficient 
algorithm  can  distinguish  between  function  descriptions  selected  according  to  either  T  or  Q,  where  T  and 
Q  are  probability  distributions  over  bitstrings  that  are  interpreted  as  functions  using  the  same  evaluation 
algorithm. 

A  function  family  T  is  it,  e )  collision  resistant  if  for  all  (probabilistic)  algorithms  A  running  in  time  at 
most  t, 

Pr [/  F-  F,  (, x , x')  <—  A(f)  :  f{x)  =  f[x')  Ax  A  A  <  e. 

Let  A  be  a  probability  distribution  over  the  domain  A  of  a  function  family  F.  We  recall  the  following 
standard  security  notions: 

•  (A,  X)  is  {t,  e)-one-way  if  for  all  probabilistic  algorithms  A  running  in  time  at  most  t, 

Pr [/  <-  F,x  A-  X  :  A(f,f(x))  E  f~Hf{x))]  <  e. 

•  (FjX)  is  (■ t ,  e)-uninvertible  if  for  all  probabilistic  algorithms  A  running  in  time  at  most  t, 

Pr[/  <—  F,  x  F-  A  :  A(f,  f(x))  =  x\  <  e. 
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•  (  J- .  X)  is  (t,  e)-second  preimage  resistant  if  for  all  probabilistic  algorithms  A  running  in  time  at  most  t, 

Pr [/  «—  T,  x  4-  X,  x'  A-  A(f,  x )  :  f(x )  =  f(x')  A  x  x']  <  e. 

•  (J7,  Tf)  is  (f,  e)-pseudorandom  if  the  distributions  {/  J7,  x  -C—  df  :  (/,  /(.x))}  and  {/  •(—  J7,  y  <— 

ZY(y)  :  (/,  y)}  are  (f,  e)-indistinguishable. 

The  above  probabilities  (or  the  absolute  difference  between  probabilities,  for  indistinguishability)  are 
called  the  advantages  in  breaking  the  corresponding  security  notions.  It  easily  follows  from  the  definition 
that  if  a  function  family  is  one-way  with  respect  to  any  input  distribution  X,  then  it  is  also  uninvertible  with 
respect  to  the  same  input  distribution  X.  Also,  if  a  function  family  is  collision  resistant,  then  it  is  also  second 
preimage  resistant  with  respect  to  any  efficiently  samplable  input  distribution. 

All  security  definitions  are  immediately  adapted  to  the  asymptotic  setting,  where  we  implicitly  consider 
sequences  of  finite  function  families  indexed  by  a  security  parameter.  In  this  setting,  for  any  security  definition 
(one-wayness,  collision  resistance,  etc.)  we  omit  t,  and  simply  say  that  a  function  is  secure  if  for  any  t  that  is 
polynomial  in  the  security  parameter,  it  is  {t,  e)-secure  for  some  e  that  is  negligible  in  the  security  parameter. 
We  say  that  a  function  family  is  statistically  secure  if  it  is  (t,  e)-secure  for  some  negligible  e  and  arbitrary  t, 
i.e.,  it  is  secure  even  with  respect  to  computationally  unbounded  adversaries. 

The  composition  of  function  families  is  defined  in  the  natural  way.  Namely,  for  any  two  function  families 
with  [J7]  C  X  — >  Y  and  [Q]  C.Y  ^  Z,  the  composition  Q  o  T  is  the  function  family  that  selects  /  -t—  T  and 
g  -c—  Q  independently  at  random,  and  outputs  the  function  (g  o  f) :  X  — >  Z. 

2.2  Lossy  Function  Families 

Lossy  functions,  introduced  in  If23l.  are  usually  defined  in  the  context  of  trapdoor  function  families,  where 
the  functions  are  efficiently  invertible  with  the  help  of  some  trapdoor  information,  and  therefore  injective  (at 
least  with  high  probability  over  the  choice  of  the  key).  We  give  a  more  general  definition  of  lossy  function 
families  that  applies  to  non-injective  functions  and  arbitrary  input  distributions,  though  we  will  be  mostly 
interested  in  input  distributions  that  are  uniform  over  some  set. 

Definition  2.1.  Let  C,  T  be  two  probability  distributions  (with  possibly  different  supports)  over  the  same  set 
of  (efficiently  computable)  functions  F  C  X  — >  Y,  and  let  X  be  an  efficien  tly  sampleable  distribution  over 
the  domain  X.  We  say  that  {£.  T .  X)  is  a  lossy  function  family  if  the  following  properties  are  satisfied: 

•  the  distributions  C  and  T  are  indistinguishable, 

•  {£,  X)  is  uninvertible,  and 

•  (J7,  X )  is  second  preimage  resistant. 

The  uninvertibility  and  second  preimage  resistance  properties  can  be  either  computational  or  statistical. 
(The  definition  from  ll23l  requires  both  to  be  statistical.)  We  remark  that  uninvertible  functions  and  second 
preimage  resistant  functions  are  not  necessarily  one-way.  For  example,  the  constant  function  f(x)  =  0  is 
(statistically)  uninvertible  when  |X|  is  super-polynomial  in  the  security  parameter,  and  the  identity  function 
f(x)  =  x  is  (statistically)  second  preimage  resistant  (in  fact,  even  collision  resistant),  but  neither  is  one-way. 
Still,  if  a  function  family  is  simultaneously  uninvertible  and  second  preimage  resistant,  then  one-wayness 
easily  follows. 

Lemma  2.2.  Let  T  be  a  family  of  functions  computable  in  time  t' .  If  (J7.  X )  is  both  (t.  e) -uninvertible  and 
(t  +  t' ,  e')-second preimage  resistant,  then  it  is  also  ( t ,  e  +  e')-one-way. 
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Proof.  Let  A  be  an  algorithm  running  in  time  at  most  t  and  attacking  the  one-wayness  property  of  (F.  X). 
Let  /  <—  F  and  x  <—  X  be  chosen  at  random,  and  compute  y  <—  A(f,  f(x)).  We  want  to  bound  the 
probability  that  f(x)  =  f(y).  We  consider  two  cases: 

•  If  x  =  y,  then  A  breaks  the  uninvertibility  property  of  (F.  X). 

•  If  x  A  y>  then  A'(f ,  x)  =  A(f.  fix))  breaks  the  second  preimage  property  of  (F.  X). 

By  assumption,  the  probability  of  these  two  events  are  at  most  e  and  e'  respectively.  By  the  union  bound,  A 
breaks  the  one-wayness  property  with  advantage  at  most  e  +  e'.  □ 

It  easily  follows  by  a  simple  indistinguishability  argument  that  if  (  C.  F .  X)  is  a  lossy  function  family, 
then  both  {C,X)  and  (F,X)  are  one-way. 

Lemma  2.3.  Let  F  and  F'  be  any  two  indistinguishable,  efficiently  computable  function  families,  and  let  X 
be  an  efficiently  sampleable  input  distribution.  Then  if  ( F,  X )  is  uninvertible  ( respectively,  second-preimage 
resistant),  then  (F' ,  X)  is  also  uninvertible  (resp.,  second-preimage  resistant).  In  particular,  if  ( C ,  F ,  X )  is  a 
lossy  function  family,  then  (C,  X )  and  (F ,  X)  are  both  one-way. 

Proof.  Assume  that  (F,  X)  is  uninvertible  and  that  there  exists  an  efficient  algorithm  A  breaking  the 
uninvertibility  property  of  (F',  X).  Then  F  and  F'  can  be  efficiently  distinguished  by  the  following 
algorithm  Vff):  choose  x  <—  X,  compute  x!  4—  A(f,  f(x)),  and  accept  if  A  succeeded,  i.e.,  if  x  =  x' . 

Next,  assume  that  (F,  X)  is  second  preimage  resistant,  and  that  there  exists  an  efficient  algorithm  A 
breaking  the  second  preimage  resistance  property  of  (F' .  X).  Then  F  and  F'  can  be  efficiently  distinguished 
by  the  following  algorithm  £>(/):  choose  x  <—  X,  compute  x'  <—  A(f,  x),  and  accept  if  A  succeeded,  i.e.,  if 
x  f=-  x'  and  f(x)  =  f{x'). 

It  follows  that  if  (£,  F,  X)  is  a  lossy  function  family,  then  (£,  X)  and  ( F ,  X)  are  both  uninvertible  and 
second  preimage  resistant.  Therefore,  by  Lemma [£2}  they  are  also  one-way.  □ 

The  standard  definition  of  (injective)  lossy  trapdoor  functions  ||23l,  is  usually  stated  by  requiring  the  ratio 
l/POI/l-Xj  to  be  small.  Our  general  definition  can  easily  be  related  to  the  standard  definition  by  specializing 
it  to  uniform  input  distributions.  The  next  lemma  gives  an  equivalent  characterization  of  uninvertible  functions 
when  the  input  distribution  is  uniform. 

Lemma  2.4.  Let  C  be  a  family  of  functions  on  a  common  domain  X,  and  let  X  =  U  (X )  the  uniform 
input  distribution  over  X.  Then  (C.  X)  is  e-uninvertible  (even  statistically,  with  respect  to  computationally 
unbounded  adversaries)  for  e  =  Ey<_£[|/(X)|]/| X\. 

Proof.  Fix  a  function  /,  and  choose  a  random  input  x  4—  X.  The  best  (computationally  unbounded)  attack 
on  the  uninvertibility  of  (C,X),  given  input  /  and  y  =  f(x),  outputs  an  x'  E  X  such  that  f(x')  =  y  and 
the  probability  of  x'  under  X  is  maximized.  Since  X  is  the  uniform  distribution  over  X,  the  conditional 
distribution  of  x  given  y  is  uniform  over  f~1{y),  and  the  attack  succeeds  with  probability  l/|/_1(y)|.  Each  y 
is  output  by  /  with  probability  |/_1(y)|/|X|.  So,  the  success  probability  of  the  attack  is 


Taking  the  expectation  over  the  choice  of  /,  we  get  that  the  attacker  succeeds  with  probability  e. 


□ 
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We  conclude  this  section  with  the  observation  that  uninvertibility  behaves  as  expected  with  respect  to 
function  composition. 

Lemma  2.5.  IfifF,  X)  is  uninvertible  and  Q  is  any  family  of  efficiently  computable  functions,  then  (Q  o(F,  X) 
is  also  uninvertible. 

Proof  Any  inverter  A  for  Q  o  T  can  be  easily  transformed  into  an  inverter  A'(f ,  y)  for  (F ,  X)  that  chooses 
g  <—  Q  at  random,  and  outputs  the  result  of  running  A(g  o  /,  g(y))  □ 

A  similar  statement  holds  also  for  one-wayness,  under  the  additional  assumption  that  Q  is  second  preimage 
resistant,  but  it  is  not  needed  here. 

2.3  Lattices  and  Gaussians 

An  n-dimensional  lattice  of  rank  k  is  the  set  A  of  integer  combinations  of  k  linearly  independent  vectors 
bi, . . . ,  bfc  €  Mn,  i.e.  A  =  |X^=i  x,b ,  |  Xi  E  Z  for  i  =  1, . . . ,  fc|.  The  matrix  B  =  [bi, . . . ,  b*.]  is  called 
a  basis  for  the  lattice  A.  The  dual  of  a  (not  necessarily  full-rank)  lattice  A  is  the  set  A*  =  {x  £  span(A)  : 
Vy  E  A,  (x,  y)  E  Z}.  In  what  follows,  unless  otherwise  specified  we  work  with  full-rank  lattices,  where 
k  =  n. 

The  ith  successive  minimum  A, (A)  is  the  smallest  radius  r  such  that  A  contains  i  linearly  independent 
vectors  of  (Euclidean)  length  at  most  r.  A  fundamental  computational  problem  in  the  study  of  lattice 
cryptography  is  the  approximate  Shortest  Independent  Vectors  Problem  SIVP7,  which,  on  input  a  full -rank 
n-dimensional  lattice  A  (typically  represented  by  a  basis),  asks  to  find  n  linearly  independent  lattice  vectors 
vi, . . . ,  vn  E  A  all  of  length  at  most  7  •  An(A),  where  7  >  1  is  an  approximation  factor  and  is  usually  a 
function  of  the  lattice  dimension  n.  Another  problem  is  the  (decision  version  of  the)  approximate  Shortest 
Vector  Problem  GapSVP7,  which,  on  input  an  //-dimensional  lattice  A,  asks  to  output  “yes”  if  Ai(A)  <  1 
and  “no”  if  Ai(A)  >  7.  (If  neither  is  the  case,  any  answer  is  acceptable.) 

For  a  matrix  B  =  [bi, . . . ,  b/A  of  linearly  independent  vectors,  the  Gram-Schmidt  orthogonalization  B 
is  the  matrix  of  vectors  b,  where  bi  =  bi,  and  for  each  i  =  2. ... .  k.  the  vector  b,  is  the  projection  of  bt 
orthogonal  to  span(bi, . . . ,  b,_  1 ).  The  Gram-Schmidt  minimum  of  a  lattice  A  is  bl( A)  =  miiiB  ||B||,  where 
|| B  ||  =  max,;  1 1 b,||  and  the  minimum  is  taken  over  all  bases  B  of  A.  Given  any  basis  D  of  a  lattice  A  and 
any  set  S  of  linearly  independent  vectors  in  A,  it  is  possible  to  efficiently  construct  a  basis  B  of  A  such  that 
|| B ||  <  ||S||  (see  lfT6l). 

The  Gaussian  function  ps  :  Mm  — y  M  with  parameter  s  is  defined  as  ps(x)  =  exp(— 7t||x||2/s2).  When  s 
is  omitted,  it  is  assumed  to  be  1.  The  discrete  Gaussian  distribution  Da+c,s  with  parameter  s  over  a  lattice 
coset  A +  c  is  the  distribution  that  samples  each  element  x  E  A  +  c  with  probability  ps(pc.) /p.s(A  +  c),  where 
ps{ A  +  c)  =  J]ygA+c  ps{ y)  is  a  normalization  factor. 

For  any  e  >  0,  the  smoothing  parameter  ye( A)  lfl9ll  is  the  smallest  s  >  0  such  that  p\/s{ A*  \  {0})  <  e. 
When  e  is  omitted,  it  is  some  unspecified  negligible  function  e  =  of  the  lattice  dimension  or  security 

parameter  n,  which  may  vary  from  place  to  place. 

We  observe  that  the  smoothing  parameter  satisfies  the  following  decomposition  lemma.  The  general  case 
for  the  sum  of  several  lattices  (whose  linear  spans  have  trivial  pairwise  intersections)  follows  immediately  by 
induction. 

Lemma  2.6.  Let  lattice  A  =  Ai  +  A2  be  the  (internal  direct )  sum  of  two  lattices  such  that  span(Ai)  n 
span(A2)  =  {0},  and  let  A2  be  the  projection  of  A2  orthogonal  to  span(Ai).  Then  for  any  ei,  £2,  e  >  0  such 
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that  1  +  e  =  (1  +  ei)(l  +  e2),  we  have 

Pe( A2)  <  r/e(A)  <  rye(Ai  +  A2)  <  max{r7ei(Ai),r/£2(A2)}. 

Proof.  Let  A*,  A*  and  A 2  be  the  dual  lattices  of  A,  Ai  and  A2,  respectively.  For  the  first  inequality,  notice 
that  Aj  is  a  sublattice  of  A*.  Therefore,  p\/s{ Aj  \  {0})  <  Pi/S(A*  \  {0})  for  any  s  >  0,  and  thus 
?L(A2)  <  7/e (A). 

Next  we  prove  that  rje( A)  <  rje(A]  +  A2).  It  is  routine  to  verify  that  we  can  express  the  dual  lattice  A* 
as  the  sum  A*  =  Aj  +  Aj,  where  Ai  is  the  projection  of  Ai  orthogonal  to  span(A2),  and  Aj  is  its  dual. 
Moreover,  the  projection  of  Aj  orthogonal  to  span(Aj)  is  exactly  Aj.  For  any  xy  E  Aj,  let  xi  E  Aj  denote 
its  projection  orthogonal  to  span(Aj).  Then  for  any  s  >  Owe  have 


pi/.(a-) 


E  E  Pt/s(xi  +x2) 

xieA*  xoSAJ; 


=  Y  Y  /9t/S(xl)-Fl/S((xl-xl)+x2) 

xieAJ  x2SA2 

=  Y  ■  Pi/S((*i  -  Xi)  +  Aj) 

xieA* 

<  Pt/s(Aj)  •  Pi/s(A2)  =  p1/s(Aj  +  Aj)  =  Pi/s((A1  +  A2)*), 

where  the  inequality  follows  from  the  bound  P\/S(A  +  c)  <  p\/s{ A)  from  Ifl9l  Lemma  2.9],  and  the  last  two 
equalities  follow  from  the  orthogonality  of  Aj  and  Aj.  This  proves  that  j/£(A)  <  ( A 1  +  A2). 

Finally,  for  si  =  7/£l(Ai),  s2  =  ?7£2(A2)  and  s  =  maxjsi,  s2},  we  have 


Pt/s((Ai  +  A2)*)  =  p1/s(Aj)  •  Pi/s(Aj)  <  Pi/Sl(Aj)  •  Pi/S2(A 

Therefore,  r/e( Ai  +  Aj)  <  s. 


□ 


Using  the  decomposition  lemma,  one  easily  obtains  known  bounds  on  the  smoothing  parameter.  For 
example,  for  any  lattice  basis  B  =  [bi, . . . ,  bn],  applying  Lem  111a |2.6| rcpcatcdl y  to  thc  decomposition  into 
the  rank-1  lattices  defined  by  each  of  the  basis  vectors  yields  r/( B  •  Zn)  <  max*  r/(b,  •  Z)  =  ||B||  ■  un, 
where  con  =  r/(Z)  =  w(\/logn)  is  the  smoothing  parameter  of  the  integer  lattice  Z.  Choosing  a  basis  B 
achieving  bl (A)  =  mine  ||B||  (where  the  minimum  is  taken  over  all  bases  B  of  A),  we  get  the  bound 
77(A)  <  bl( A)  •  u)n  from  lfl2l  Theorem  3.1].  Similarly,  choosing  a  set  S  C  A  of  linearly  independent  vectors 
of  length  ||S||  <  An(A),  we  get  the  bound  77(A)  <  77 ( S  •  Zn)  <  ||S||  ■  ujn  <  ||S||  ■  un  =  An(A)  •  ujn  from  |[T9l 
Lemma  3.3].  In  this  paper  we  use  a  further  generalization  of  these  bounds,  still  easily  obtained  from  the 
decomposition  lemma. 

Corollary  2.7.  The  smoothing  parameter  of  the  tensor  product  of  any  two  lattices  A±,  A2  satisfies  r/(A  1  (X) 

A2)  <  W(A!)  •  t/(A2). 

Proof  Let  B  =  [bi, . . . ,  b/.]  be  a  basis  of  Ai  achieving  maxj  | b,  |  =  hl(A\ ),  and  consider  the  natural 
decomposition  of  Ai  ®  A2  into  the  sum 

(bi  <g>  A2)  H - F  (bfc  ®  A2). 


Notice  that  the  projection  of  each  sublattice  b,  ®  A2  orthogonal  to  the  previous  sublattices  b?  ®  A2  (for 
j  <  i )  is  precisely  bj  ®  Ao,  and  has  smoothing  parameter  ?/(bj  <g>  A2)  =  | b,  |  •  77(A2).  Therefore,  by  repeated 


application  of  Lemma  2.6  we  have  77 (Ai  <S>  A2)  <  maxj  ||bj||  •  77 (A2)  =  bl( Ai)  •  77 (A2). 


□ 
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The  following  proposition  relates  the  problem  of  sampling  lattice  vectors  according  to  a  Gaussian 
distribution  to  the  SIVP. 

Proposition  2.8  (H241.  Lemma  3.17).  There  is  a  polynomial  time  algorithm  that,  given  a  basis  for  an  n- 
dimensional  lattice  A  and  polynomially  many  samples  from  Da, a  for  some  a  >  277(A),  solves  SIVP7  on 
input  lattice  A  (in  the  worst  case  over  A,  and  with  overwhelming  probability  over  the  choice  of  the  lattice 
samples)  for  approximation  factor  7  =  crfn  ■  con. 

2.4  The  SIS  and  LWE  Functions 

In  this  paper  we  are  interested  in  two  special  families  of  functions,  which  are  the  fundamental  building  blocks 
of  lattice  cryptography.  Both  families  are  parametrized  by  three  integers  m,  n  and  q,  and  a  set  X  C  Zm  of 
short  vectors.  Usually  n  serves  as  a  security  parameter  and  m  and  q  are  functions  of  n. 

The  Short  Integer  Solution  function  family  SIS(m,  n,  q,  X)  is  the  set  of  all  functions  /a  indexed  by 
A  e  Z”xm  with  domain  X  C  Zm  and  range  Y  =  Z™,  defined  as  /a(x)  =  Ax  mod  q.  The  Learning 
With  Errors  function  family  LWE(m,  n,  q,  X)  is  the  set  of  all  functions  //a  indexed  by  A  G  Z"xm  with 
domain  Z™  x  X  and  range  Y  =  Z”\  defined  as  (ja(s.  x)  =  Ars  +  x  mod  q.  Both  function  families  are 
endowed  with  the  uniform  distribution  over  A  e  Z”xrn.  We  omit  the  set  X  from  the  notation  SIS(m,  n,  q) 
and  LWE  (to,  n,  q)  when  clear  from  the  context,  or  unimportant. 

In  the  context  of  collision  resistance,  we  sometimes  write  SIS  (to,  n,  q,  3)  for  some  real  3  >  0,  without 
an  explicit  domain  X.  Here  the  collision-finding  problem  is,  given  A  G  Z”xm,  to  find  distinct  x,  x'  e  Zm 
such  that  ||x  —  x'H  <  3  and  /a(x)  =  /a(x').  It  is  easy  to  see  that  this  is  equivalent  to  finding  a  nonzero 
z  G  Zm  of  length  at  most  ||z||  <  3  such  that  /a(z)  =  0. 

For  other  security  properties  (e.g.,  one-wayness,  uninvertibility,  etc.),  the  most  commonly  used  classes  of 
domains  and  input  distributions  X  for  SIS  are  the  uniform  distribution  U (X)  over  the  set  X  =  {0, . . . ,  s— l}m 
or  X  =  {— s, . . . ,  0, . . . ,  s}m,  and  the  discrete  Gaussian  distribution  D™s.  Usually,  this  distribution  is 
restricted  to  the  set  of  short  vectors  X  =  {xG  Zm :  ||x||  <  sffn},  which  carries  all  but  a  2  Wm)  fraC|jon 
of  the  probability  mass  of  D™s. 

For  the  LWE  function  family,  the  input  is  usually  chosen  according  to  distribution  (Y(Z™ )  x  X,  where  X 
is  one  of  the  SIS  input  distributions.  This  makes  the  SIS  and  LWE  function  families  essentially  equivalent, 
as  shown  in  the  following  proposition. 

Proposition  2.9  (lfT5lfT7  l).  For  any  n,  m>  n  +  cu(log  n),  q,  and  distribution  X  over  Zm,  the  LWE(m,  n,  q) 
function  family  is  one-way  (resp.  pseudorandom,  or  uninvertible)  with  respect  to  input  distribution  U(  Z”)  X  X 
if  and  only  if  the  SIS(m,  to  —  n,  q)  function  family  is  one-way  (resp.  pseudorandom,  or  uninvertible )  with 
respect  to  the  input  distribution  X. 

In  applications,  the  SIS  function  family  is  typically  used  with  larger  input  domains  X  for  which  the 
functions  are  surjective  but  not  injective,  while  the  LWE  function  family  is  used  with  smaller  domains  X  for 
which  the  functions  are  injective,  but  not  surjective.  The  results  in  this  paper  are  more  naturally  stated  using 
the  SIS  function  family,  so  we  will  use  the  SIS  formulation  to  establish  our  main  results,  and  then  reformulate 
them  in  terms  of  the  LWE  function  family  by  invoking  Proposition  |2.9[  We  also  use  Proposition  |2.9|  to 
reformulate  known  hardness  results  (from  worst-case  complexity  assumptions)  for  LWE  in  terms  of  SIS. 

Assuming  the  quantum  worst-case  hardness  of  standard  lattice  problems,  Regev  ll24l  showed  that  the 
LWE(m,  n,  q)  function  family  is  hard  to  invert  with  respect  to  the  discrete  Gaussian  error  distribution  Dfa 
for  any  cr  >  ‘Ifn.  (See  also  ll2lll  for  a  classical  reduction  that  requires  q  to  be  exponentially  large  in  n. 
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Because  we  are  concerned  with  small  parameters  in  this  work,  we  focus  mainly  on  the  implications  of  the 
quantum  reduction.) 

Proposition  2.10  ((24),  Theorem  3.1).  For  any  m  =  n integer  q  and  real  a  £  (0,1)  such  that  aq  > 
2ffin,  there  is  a  polynomial  time  quantum  reduction  from  sampling  I)\n  (for  any  n-dimensional  lattice  A 
and  <j  >  (y/2n/a)r](A))  to  inverting  the  LWE(m,  n,  q)  function  family  on  input  y  =  Dimaq. 

Combining  Propositions  [T8]  [T9]  and  [2T0|  we  get  the  following  corollary. 

Corollary  2.11.  For  any  positive  m,n  such  that  cc(logn)  <  m  —  n  <  n°^  and  2ffin  <  a  <  q,  the 
SIS(m,  m  —  n,  q)  function  family  is  uninvertible  with  respect  to  input  distribution  under  the  assumption 
that  no  ( quantum )  algorithm  can  efficiently  sample  from  a  distribution  statistically  close  to  DA  s/2nq/fJ- 
In  particular,  assuming  the  worst-case  (quantum)  hardness  ofSYVPnWnq/a  over  n-dimensional  lattices, 
the  SIS(m,  m  —  n,  q)  function  family  is  uninvertible  with  respect  to  input  distribution  Dl£  . 

We  use  the  fact  that  LWE/SIS  is  not  only  hard  to  invert,  but  also  pseudorandom.  This  is  proved  using 
search-to-decision  reductions  for  those  problems.  The  most  general  such  reductions  known  to  date  are  given 
in  the  following  two  theorems. 

Theorem  2.12  ((171).  For  any  positive  m,  n  such  that  w(log  n)  <m  —  n  <  n°^\  any  positive  a  <  n°^\ 
and  any  q  with  no  divisors  in  the  inten’al  ((a / un)m^k ,  a  ■  un),  ifSlS(m,  m  —  n,  q,  Ftfffi)  is  uninvertible, 
then  it  is  also  pseudorandom. 

Notice  that  when  a  >  k\  the  interval  ((cr/un)m/k ,  a  ■  l on)  is  empty,  and  Theorem 

without  any  restriction  on  the  factorization  of  the  modulus  q. 

Theorem  2.13  ((181).  Let  q  have  prime  factorization  q  =  pef  •  •  ■  pekk  for  pairwise  distinct  poly  (n)-bounded 
primes  p%  with  each  et  >  1,  and  let  0  <  a  <  l/un.  If  LWE(m,  n,  q.  Hfiyq)  is  hard  to  invert  for  all 
m(n )  =  n°^\  then  LWE(m/,  n,  q ,  FFfa,q)  is  pseudorandom  for  any  m!  =  n° ^  and 

a  >  max{a,wi+1/f  •  a1/e,  un/pef , . .  ,,un/pekk}, 
where  I  is  an  upper  bound  on  number  of  prime  factors  pi  <  ujn/a'. 


2.12 


holds 


In  this  work  we  focus  on  the  use  of  Theorem  2.12  because  it  guarantees  pseudorandomness  for  the  same 
value  of  m  as  for  the  assumed  one-wayness.  This  feature  is  important  for  applying  our  results  from  Section [4] 
which  guarantee  one-wayness  for  particular  values  of  m  (but  not  necessarily  all  m  =  n0,'  l  >  ). 


Corollary  2.14.  For  any  positive  m,  n,  a,  q  such  thatu(\og  n)  <  m—n  <  n°^  and  2  ffin  <  o  <  q  <  n0^\ 
if  q  has  no  divisors  in  the  range  ((o/uin)1+n^k ,  cr  ■  con),  then  the  SIS(m,m  —  n,q)  function  family  is 
pseudorandom  with  respect  to  input  distribution  D™a,  under  the  assumption  that  no  (quantum)  algorithm 
can  efficiently  sample  (up  to  negligible  statistical  errors)  DA  ^2nqjrj- 

In  particular,  assuming  the  worst-case  (quantum)  hardness  of  SIVPna,rig/<7  on  n-dimensional  lattices,  the 
SIS(m,  m  —  n,  q)  function  family  is  pseudorandom  with  respect  to  input  distribution  D1£a. 
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3  Hardness  of  SIS  with  Small  Modulus 


We  first  prove  a  simple  “success  amplification”  lemma  for  collision-finding  in  SIS,  which  says  that  any 
inverse -polynomial  advantage  can  be  amplified  to  essentially  1,  at  only  the  expense  of  a  larger  runtime  and 
value  of  m  (which  will  have  no  ill  effects  on  our  final  results).  Therefore,  for  the  remainder  of  this  section  we 
implicitly  restrict  our  attention  to  collision-finding  algorithms  that  have  overwhelming  advantage. 

Lemma  3.1.  For  arbitrary  n,  q,  rn  and  X  C  Zm,  suppose  there  exists  a  probabilistic  algorithm  A  that  has 
advantage  e  >  0  in  collision-finding  for  SIS(m,  n,  q,  X).  Then  there  exists  a  probabilistic  algorithm  B  that 
has  advantage  1  —  (1  —  e)*  >  1  —  exp(— et)  =  1  —  exp  (— n)  in  collision-finding  for  SIS(M  =  t-m,n,q ,  X'), 
where  t  =  n/e  and  X'  =  |J*=1({0m}*_1  x  X  x  {0m}t~l).  The  runtime  ofB  is  essentially  t  times  that  of  A. 

Proof  The  algorithm  B  simply  partitions  its  input  A  G  Z”x  A/  into  blocks  Aj  G  Z”Xm  and  invokes  A  (with 
fresh  random  coins)  on  each  of  them,  until  A  returns  a  valid  collision  x,  x'  G  X  for  some  A,.  Then  B  returns 

(O™^1),  x,  0m^),  x',  0m^)  G  X' 

as  a  collision  for  A.  Clearly.  B  succeeds  if  any  call  to  A  succeeds.  Since  all  t  calls  to  A  arc  on  independent 
inputs  A i  and  use  independent  coins,  some  call  will  succeed,  except  with  ( 1  —  e)L  probability.  □ 

3.1  SIS-to-SIS  Reduction 

Our  first  proof  that  the  SIS(m,  n,  q,  ft)  function  family  is  collision  resistant  for  moduli  q  as  small  as  n1,/2+'s 
proceeds  by  a  reduction  between  SIS  problems  with  different  parameters.  Previous  hardness  results  based  on 
worst-case  lattice  assumptions  require  the  modulus  q  to  be  at  least  0  ■  uifin  log  n)  fl2l  Theorem  9.2],  and 
ft  >  \Jn  logg  is  needed  to  guarantee  that  a  nontrivial  solution  exists.  For  such  parameters,  SIS  is  collision 
resistant  assuming  the  hardness  of  approximating  worst-case  lattice  problems  to  within  ~  fts/n  factors. 

The  intuition  behind  our  proof  for  smaller  moduli  is  easily  explained.  We  reduce  SIS  with  modulus  qc 
and  solution  bound  ftc  (for  any  constant  integer  c  >  1)  to  SIS  with  modulus  q  and  bound  ft.  Then  as  long  as 
( q / ft)c  >  uj(  \/n  log  n),  the  former  problem  enjoys  worst-case  hardness,  hence  so  does  the  latter.  Thus  we  can 
take  q  =  ft  ■  n5  for  any  constant  5  >  0,  and  c  >  1/(28).  Notice,  however,  that  the  underlying  approximation 
factor  for  worst-case  lattice  problems  is  «  ftcy/n  >  n'/2  f  '/NT,  which,  while  still  polynomial,  degrades 
severely  as  5  approaches  0.  In  the  next  subsection  we  give  a  direct  reduction  from  worst-case  lattice  problems 
to  SIS  with  a  small  modulus,  which  does  not  have  this  drawback. 

The  above  discussion  is  formalized  in  the  following  proposition.  For  technical  reasons,  we  prove  that 
SIS(m,  n,  q,X)  is  collision  resistant  assuming  that  the  domain  X  has  the  property  that  all  SIS  solutions 
z  G  (X  —  X)  \  {0}  satisfy  gcd(z,  q)  =  1.  This  restriction  is  satisfied  in  many  (but  not  all)  common  settings, 
e.g.,  when  q  >  ft  is  prime,  or  when  X  C  {0,  l}m  is  a  set  of  binary  vectors. 

Proposition  3.2.  Let  n,  q,  rn,  ft  and  X  C  Zm  be  such  that  gcd(x  —  x7,  q)  =  1  and  ||x  —  x7||  <  ft  for  any 
distinct  x,  x'  G  X.  For  any  positive  integer  c,  there  is  a  deterministic  reduction  from  collision-finding  for 
SIS(mc,  n,  qc,  ftc)  to  collision-finding  for  SIS(m,  n,  q,  X)  (in  both  cases,  with  overwhelming  advantage). 
The  reduction  runs  in  time  polynomial  in  its  input  size,  and  makes  fewer  than  mc  calls  to  its  oracle. 

Proof.  Let  A  be  an  efficient  algorithm  that  finds  a  collision  for  SIS(m,  n,q,X)  with  overwhelming  advantage. 
We  use  it  to  find  a  nonzero  solution  for  SIS(mc.  n,  qc,  ftc).  Let  A  G  Z”(x in  be  an  input  SIS  instance.  Partition 
the  columns  of  A  into  mc_1  blocks  A,  G  Z'',x  m,  and  for  each  one,  invoke  A  to  find  a  collision  modulo  q, 
i.e.,  a  pair  of  distinct  vectors  x,.  x-  G  X  such  that  Atz,  =  0  mod  q,  where  z,  =  x,  —  x-  and  hi\\<ft. 
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For  each  i,  since  gcd(zj,  q)  =  1  and  A*Zj  =  0  mod  q,  the  vector  a'  =  (AjZj)/<j  E  Znc_i  is  uniformly 

c _ \ 

random,  even  after  conditioning  on  z,;  and  A,  mod  q.  So,  the  matrix  A'  E  Z”rx_T  made  up  of  all  these 

columns  is  uniformly  random.  By  induction  on  c,  using  A  we  can  find  a  nonzero  solution  z'  g  Zm"  1  such 
that  A'z'  =  0  mod  qc~  1  and  |z'|  <  /3C-1.  Then  it  is  easy  to  verify  that  a  nonzero  solution  for  the  original 
instance  A  is  given  by  z  =  (z[  ■  z\, . . . ,  z'  c_,  •  zmC-i)  G  ZmC,  and  that  ||z||  <  || z' 1 1  ■  max*  ||zj||  <  f3c. 
Finally,  the  total  number  of  calls  to  A  is  Yli= o  m?  <  as  claimed.  □ 

3.2  Direct  Reduction 

As  mentioned  above,  the  large  worst-case  approximation  factor  associated  with  the  use  of  Proposition |3.2|is 
undesirable,  as  is  (to  a  lesser  extent)  the  restriction  that  gcd(X  —  X,  q )  =  1.  To  eliminate  these  drawbacks,  we 
next  give  a  direct  proof  that  SIS  is  collision  resistant  for  small  q,  based  on  the  assumed  hardness  of  worst-case 
lattice  problems.  The  underlying  approximation  factor  for  these  problems  can  be  as  small  as  0(f5^/n),  which 
matches  the  best  known  factors  obtained  by  previous  proofs  (which  require  a  larger  modulus  q).  Our  new 
proof  combines  ideas  from  lfl9l  (T2 ]  and  Proposition |3.2[  as  well  as  a  new  convolution  theorem  for  discrete 
Gaussians  which  strengthens  similar  ones  previously  proved  in  fl22ll6l. 

Our  proof  of  the  convolution  theorem  is  substantially  different  and,  we  believe,  technically  simpler  than 
the  prior  ones.  In  particular,  it  handles  the  sum  of  many  Gaussian  samples  all  at  once,  whereas  previous  proofs 
used  induction  from  a  base  case  of  two  samples.  With  the  inductive  approach,  it  is  technically  complex  to 
verify  that  all  the  intermediate  Gaussian  parameters  (which  involve  harmonic  means)  satisfy  the  hypotheses. 
Moreover,  the  intermediate  parameters  can  depend  on  the  order  in  which  the  samples  are  added  in  the 
induction,  leading  to  unnecessarily  strong  hypotheses  on  the  original  parameters. 

Theorem  3.3.  Let  A  be  an  n-dimensional  lattice,  z  E  Zm  a  nonzero  integer  vector,  Si  >  \/2 |  z  1 1  ^  ■  // ( A), 
and  A  +  c,  arbitrary  cosets  of  A  for  i  =  1, . . . ,  m.  Let  y  j  be  independent  vectors  with  distributions  D\+CiySi, 
respectively.  Then  the  distribution  of  y  =  z,,y,  is  statistically  close  to  DytS,  where  Y  =  gcd(z)A  +  c, 

c  =  ZiCi’  and  8  =  \/12i(zisi)2- 

In  particular,  if  gcd(z)  =  1  and  ff  -  ztct  E  A,  then  y  is  distributed  statistically  close  to  D\yS. 

Proof  First  we  verify  that  the  support  of  y  is 

^2  ^(A  +  c i)  =  ^2  ZiA  +  ^2  zi '  c*  =  gcd(z)A  +  ^2zi  ■  Ci  =  Y. 

iii  i 

So  it  remains  to  prove  that  each  y£f  has  probability  (nearly)  proportional  to  ps(  y). 

For  the  remainder  of  the  proof  we  use  the  following  convenient  scaling.  Define  the  diagonal  matrices 
S  =  diag(si, . . . ,  sm)  and  S'  =  S  In,  and  the  m?r-dimensional  lattice  A'  =  ©.,(s^1A)  =  (S')^1  •  A®m, 
where  0  denotes  the  (external)  direct  sum  of  lattices  and  A®”1  =  Zm  ®  A  is  the  direct  sum  of  m  copies 
of  A.  Then  by  independence  of  the  y i,  it  can  be  seen  that  y'  =  (S')-1  •  (yi, . . . ,  ym )  has  discrete  Gaussian 
distribution  D^+Ci  (with  parameter  1),  where  c'  =  (S')-1  ■  (ci, . . . ,  cm). 

The  output  vector  y  =  z,yt  can  be  expressed,  using  the  mixed-product  property  for  Kronecker 
products,  as 

y  =  (zT  <g>  I„)  •  (yi, . . . ,  ym)  =  (z1  <g>  In)  •  S'  •  y'  =  ((zTS)  ®  In)  •  y'. 

So,  letting  Z  =  ((zv  S)  <g>  In),  we  want  to  prove  that  the  distribution  of  y  ~  Z  •  /Ty+C'  is  statistically  close 

to  Dy,s- 
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Fix  any  vectors  x'  £  A'  +  c7  and  y  =  Zx7  G  Y,  and  define  the  proper  sublattice 

L={veA':Zv  =  0}=A'n  ker(Z)  C  A'. 


It  is  immediate  to  verify  that  the  set  of  all  y'  £  A'  +  c7  such  that  Zy7  =  y  is  (A7  +  c7)  n  ker(Z)  =  L  +  x7. 
Let  x  be  orthogonal  projection  of  x7  onto  ker(Z)  D  L.  Then  we  have 


Pr[y  =  y] 


p(L  +  x.') 
p(  A7  +  c') 


p(x7 


_  p{L  +  yi) 

p(  A7  +  c7)  ‘ 


Below  we  show  that  77(A)  <  1,  which  implies  that  p(L  +  x)  is  essentially  the  same  for  all  values  of  x7,  and 
hence  for  all  y.  Therefore,  we  just  need  to  analyze  p(x7  —  x). 

Since  Z7  is  an  orthogonal  basis  for  ker(Z)-1,  each  of  whose  columns  has  Euclidean  norm  s  = 
(£,(^)2)1/2,  we  have  x7  —  x  =  (ZrZx7)/s2,  and 

||x7  -  x||2  =  (x7,  ZrZx7)/s2  =  1 1 Zx7 1 1 2/ s2  =  (||y||/s)2. 


Therefore,  p(x7  —  x)  =  ps( y),  and  so  Pr[y  =  y]  is  essentially  proportional  to  ps( y),  i.e.,  the  statistical 
distance  between  y  and  Dy,s  is  negligible. 

It  remains  to  bound  the  smoothing  parameter  of  L.  Consider  the  m-dimensional  integer  lattice  Z  = 
Zm  n  ker(zr)  =  {v  G  Zm  :  (z,  v)  =  0}.  Because  (Z  ®  A)  C  (Zm  ®  A)  and  S~1Z  C  ker(zTS),  it  is 
straightforward  to  verify  from  the  definitions  that 

(S7)’1  ■  (Z®  A)  =  ((S_1Z)®  A) 

is  a  sublattice  of  L.  It  follows  from  Corollary  [Z7]  and  by  scaling  that 

77(A)  <  ?/((S7)”1  •  (Z  <S>  A))  <  p(A)  •  bl(Z)/  minsj. 

Finally,  bl(Z)  <  min{  ||z||,  \/2 1 1 z 1 1 ,x  }  because  Z  has  a  full-rank  set  of  vectors  Zi  ■  e;  —  Zj  ■  c, ,  where  index 
i  minimizes  \z{\  /  0,  and  j  ranges  over  {1, . . . ,  m}  \  {z}.  By  assumption  on  the  Sj,  we  have  rj(L)  <  1  as 
desired,  and  the  proof  is  complete.  □ 


Remark  3.4.  Although  we  will  not  need  it  in  this  work,  we  note  that  the  statement  and  proof  of  Theorem |3 .3 1 
can  be  adapted  to  the  case  where  the  y,  respectively  have  non-spherical  discrete  Gaussian  distributions 
DAi+Ci  yjr-  with  positive  definite  “covariance”  parameters  S,  G  Mnxn,  over  cosets  of  possibly  different 
lattices  A,;.  (See  lf22l  for  a  formal  definition  of  these  distributions.) 

In  this  setting,  by  scaling  A*  and  E,  we  can  assume  without  loss  of  generality  that  z  =  (1,1,...,  1). 
The  theorem  statement  says  that  y’s  distribution  is  close  to  a  discrete  Gaussian  (over  an  appropriate  lattice 
coset)  with  covariance  parameter  E  =  ]C  E,,  under  mild  assumptions  on  \/Zt.  In  the  proof  we  simply 
let  S7  be  the  block-diagonal  matrix  with  the  \/Y  as  its  diagonal  blocks,  let  A7  =  (S7)-1  •  0 ■  A*,  and  let 
Z  =  (z7  ®  In)  •  S7  =  [y/E7  |  |  v/Em]-  Then  the  only  technical  difference  is  in  bounding  the  smoothing 

parameter  of  L. 


The  convolution  theorem  implies  the  following  simple  but  useful  lemma,  which  shows  how  to  convert 
samples  having  a  broad  range  of  parameters  into  ones  having  parameters  in  a  desired  narrow  range. 

Lemma  3.5.  There  is  an  efficient  algorithm  which,  given  a  basis  B  of  some  lattice  A,  some  R  >  y/2  and 
samples  (y where  each  Si  G  [y/2,  R]  ■  77(A)  and  each  y,  has  distribution  with  overwhelming 

probability  outputs  a  sample  (y,  s)  where  s  G  [R,  \/2 R\  ■  77(A)  and  y  has  distribution  statistically  close 
to  DAtS. 
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Proof.  Let  un  =  cc(\/log  n)  satisfy  un  <  fon.  The  algorithm  draws  2 n2  input  samples,  and  works  as 
follows:  if  at  least  n2  of  the  samples  have  parameters  Si  <  R  •  p(A) / (fon  •  un),  then  with  overwhelming 
probability  they  all  have  lengths  bounded  by  R  ■  r](A)/uin  and  they  include  n  linearly  independent  vectors. 
Using  such  vectors  we  can  construct  a  basis  S  such  that  ||  S  ||  <  R  •  r]{A)/un,  and  with  the  sampling  algorithm 
of  lfl2l  Theorem  4.1]  we  can  generate  samples  having  parameter  R  •  77(A). 

Otherwise,  at  least  n2  of  the  samples  (y,,  sf)  have  parameters  st  >  max{  /?./n,  y/2}  ■  77(A).  Then  by 
summing  an  appropriate  subset  of  those  y,;,  by  the  convolution  theorem  we  can  obtain  a  sample  having 
parameter  in  the  desired  range.  □ 

The  next  lemma  is  the  heart  of  our  reduction.  The  novel  part,  corresponding  to  the  properties  described  in 
the  second  item,  is  a  way  of  using  a  collision-finding  oracle  to  reduce  the  Gaussian  width  of  samples  drawn 
from  a  lattice.  The  first  item  corresponds  to  the  guarantees  provided  by  previous  reductions. 

Lemma  3.6.  Let  m,  n  be  integers,  S  =  { z£  Zm  \  {0}  |  ||z||  <  f3  A  1 1  z  |  |oo  <  Poo)  for  some  real  f3  >  (3  >  0, 
and  q  an  integer  modulus  with  at  most  poly(n)  integer  divisors  less  than  (fo.  There  is  a  probabilistic 
polynomial  time  reduction  that,  on  input  any  basis  B  of  a  lattice  A  and  sufficiently  many  samples  ( y, ,  $,) 
where  s*  >  s/2 q  ■  77(A)  and  y,  has  distribution  D\  Si,  and  given  access  to  an  SIS(m,  n,  q,  S )  oracle  (that 
finds  collisions  z  £  S  with  nonnegligible  probability )  outputs  (with  overwhelming  probability )  a  sample 
(y,  s)  with  min  Si/ q  <  s  <  (P/q)  •  maxsj,  and  y  £  A  such  that: 

•  E[||y||]  <  (fdy/n/q)  ■  maxsj,  and  for  any  subspace  H  C  Mn  of  dimension  at  most  n  —  1,  with 
probability  at  least  1/10  we  have  y  0  H. 

•  Moreover,  if  each  Si  >  \/2doc  q  ■  77(A),  then  the  distribution  of  y  is  statistically  close  to  I) \.s 

Proof  Let  A  be  the  collision-finding  oracle.  Without  loss  of  generality,  we  can  assume  that  whenever  A 
outputs  a  valid  collision  z  E  S,  we  have  that  gcd(z)  divides  q.  This  is  so  because  for  any  integer  vector 
z,  if  Az  =  0  mod  q  then  also  A((g/d)z)  =  0  mod  q,  where  d  =  gcd(z)  and  g  =  gcd(d,  q).  Moreover, 
(g/d)z  £  S  holds  true  and  gcd((p/d)z)  =  gcd(z,  q )  divides  q.  Let  d  be  such  that  A  outputs,  with  non¬ 
negligible  probability,  a  valid  collision  z  satisfying  gcd(z)  =  d.  Such  a  d  exists  because  gcd(z)  is  bounded 
by  Poo  and  divides  q,  so  by  assumption  there  are  only  polynomially  many  possible  values  of  d.  Let  q'  =  q/d, 
which  is  an  integer.  By  increasing  m  and  using  standard  amplification  techniques,  we  can  make  the  probability 
that  A  outputs  such  a  collision  (satisfying  z  €  S,  Az  =  0  (mod  q)  and  gcd(z)  =  d)  exponentially  close 
to  1. 

Let  (y i,  Si)  for  i  =  1, ...  ,m  be  input  samples,  where  y,  has  distribution  D,\  Si.  Write  each  y,  as 
y i  =  Ba,  mod  q'A  for  a,  £  Z™,.  Since  st  >  q' 77(A)  the  distribution  of  a*  is  statistically  close  to  uniform  over 
Z”,.  Let  A  =  [ai  |  •  •  •  |  am]  £  Z/Xm,  and  choose  A'  £  Z/x"'  uniformly  at  random.  Since  A  is  statistically 
close  to  uniform  over  Z”,xm,  the  matrix  A  +  q'A1  is  statistically  close  to  uniform  over  Z/Xm.  Call  the  oracle 
A  on  input  A+q'A’,  and  obtain  (with  overwhelming  probability)  a  nonzero  z  £  S  with  gcd(z)  =  d,  |z||  <  ft, 
1 1 z  1 1 00  <  Poo  and  (A  +  q'A')z  =  0  mod  q.  Notice  that  q'A'z  =  qA'(z/d)  =  0  mod  q  because  (z / d)  is  an 
integer  vector.  Therefore  Az  =  0  mod  q.  Finally,  the  reduction  outputs  (y,  s),  where  y  =  V/  z,,yt / q  and 
s  =  \/Yli(sizi)2 / Q-  Notice  that  zyyi  £  qA  +  ~B(ziSq)  because  gcd(z)  =  d,  so  y  £  A. 

Notice  that  s  satisfies  the  stated  bounds  because  z  is  a  nonzero  integer  vector.  We  next  analyze  the 
distribution  of  y.  For  any  fixed  a*,  the  conditional  distribution  of  each  y,  is  /-U/'A+Ba,..s,  >  where  st  > 
/‘hji  f  A).  The  claim  on  E[||y||]  then  follows  from  lfl9l  Lemma  2.1 1  and  Lemma  4.3]  and  Holder’s  inequality. 
The  claim  on  the  probability  that  y  /  H  was  initially  shown  in  the  preliminary  version  of  fl9l ;  see  also  lf24l 
Lemma  3.15]. 
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Now  assume  that  st  >  \f28xq  •  77(A)  >  \/2||z||  •  p(q' A)  for  all  i.  By  Theorem  3.3  the  distribution  of 


y  is  statistically  close  to  DY/q)S  where  Y  =  gcd(z)  •  q' A  +  B(Az).  Using  Az  =  0  mod  q  and  gcd(z)  =  d, 
we  get  Y  =  qA.  Therefore  y  has  distribution  statistically  close  to  /f  \  s,  as  claimed.  □ 


Building  on  Lemma 


3.6 


our  next  lemma  shows  that  for  any  q  >  8  ■  rvlt  1  h  a  collision-finding  oracle  can 
be  used  to  obtain  Gaussian  samples  of  width  close  to  288x  •  77(A). 

Lemma  3.7.  Let  rn.  n,  q.  S  as  in  Lemma  3.6  and  also  assume  q//3  >  n5  for  some  constant  6  >  0.  There  is 


an  efficient  reduction  that,  on  input  any  basis  B  of  an  n-dimensional  lattice  A,  an  upper  bound  77  >  77(A),  and 
given  access  to  an  SIS(m,  n,  q,  S )  oracle  (finding  collisions  z  £  .S'  with  nonnegligible probability),  outputs 
(with  overwhelming  probability )  a  sample  (y,  s)  where  yf2/30 0  ■  rj  <  s  <  28x8  ■  77  and  y  has  distribution 
statistically  close  to  /T\,.s- 

Proof.  By  applying  the  LLL  basis  reduction  algorithm  llT3ll  to  the  basis  B,  we  can  assume  without  loss 
of  generality  that  ||B||  <  2n  •  77(A).  Let  ojn  be  an  arbitrary  function  in  n  satisfying  un  =  ui  (s/logn)  and 
<  s/n/2. 

The  main  procedure,  described  below,  produces  samples  having  parameters  in  the  range  [1,  q]  ■  \/28x  ■  ??■ 


oj. 


3.5 


(with  R  =  s/2 80 


if)  to  obtain  samples  having 
on  those  samples 


3.6 


On  these  samples  we  run  the  procedure  from  Lemma 

parameters  in  the  range  [s/2,  2]  •  800 q  ■  V-  Finally,  we  invoke  the  reduction  from  Lemma 
to  obtain  a  sample  satisfying  the  conditions  in  the  Lemma  statement. 

The  main  procedure  works  in  a  sequence  of  phases  i  =  0, 1,  2, . . ..  In  phase  i,  the  input  is  a  basis  B j 
of  A,  where  initially  Bo  =  B.  The  basis  B,  is  used  in  the  discrete  Gaussian  sampling  algorithm  of  lfl2l 
Theorem  4.1]  to  produce  samples  (y,  s,),  where  st  =  max{  ||B,;|  •  c on,  \/28xrj}  >  s/28xr/  and  y,  has 
distribution  statistically  close  to  D\iSi.  Phase  i  either  manages  to  produce  a  sample  (y,  s)  with  s  in  the 
desired  range  [1,  q]  ■  \/28xij,  or  it  produces  a  new  basis  B(+  i  for  which  ||B,+i||  <  ||B,  ||/2,  which  is  the 
input  to  the  next  phase.  The  number  of  phases  before  termination  is  clearly  polynomial  in  n,  by  hypothesis 
on  B. 

If  ||Bj ||  -ojn  <  s/2q8xri,  then  this  already  gives  samples  with  G  [l,qW28ooV  in  the  desired  range,  and 
we  can  terminate  the  main  phase.  So,  we  may  assume  that  Si  =  |B,  ||  -ujn  >  s/2q8xrj.  Each  phase  i  proceeds 
in  some  constant  c  >  1/5  number  of  sub-phases  j  =  1,  2, . . . ,  c,  where  the  inputs  to  the  first  sub-phase 
are  the  samples  (y,  sf)  generated  as  described  above.  We  recall  that  these  samples  satisfy  Si  >  s/2 q8  00^7  • 
The  same  will  be  true  for  the  samples  passed  as  input  to  all  other  subsequent  subphases.  So,  each  subphase 
receives  as  input  samples  (y,  s)  satisfying  all  the  hypotheses  of  Lemma  |3.6|  and  we  can  run  the  reduction 
from  that  lemma  to  generate  new  samples  (y7,  s')  having  parameters  s'  bounded  from  above  by  st  ■  (8/q)f 
and  from  below  by  s/28xr]-  If  any  of  the  produces  samples  satisfies  s'  <  q\Z28xij.  then  we  can  terminate 
the  main  procedure  with  (y7,  s')  as  output.  Otherwise,  all  samples  produced  during  the  subphase  satisfy 
s'  >  qs/28  0077,  and  they  can  be  passed  as  input  to  the  next  sub-phase.  Notice  that  the  total  runtime  of  all 
the  sub-phases  is  poly(77.)c,  because  each  invocation  of  the  reduction  from  Lemma  3.6  relies  on  poly(n) 


invocations  of  the  reduction  in  the  previous  sub-phase;  this  is  why  we  need  to  limit  the  number  of  sub-phases 
to  a  constant  c. 

If  phase  i  ends  up  running  all  its  sub-phases  without  ever  finding  a  sample  with  s'  G  [L  q]s/28ocT),  then  it 
has  produced  samples  whose  parameters  are  bounded  by  (8/q)c  <  Si  <  s,  / -/n.  It  uses  n 2  of  these  samples, 
which  with  overwhelming  probability  have  lengths  all  bounded  by  Sj /  y/n,  and  include  n  linearly  independent 
vectors.  It  transforms  those  vectors  into  a  basis  B.;+i  with  ||B,+i||  <  Si/y/n  <  WB^uin/y/n  <  ||Bj||/2,  as 
input  to  the  next  phase.  □ 

We  can  now  prove  our  main  theorem,  reducing  worst-case  lattice  problems  with  max{l,  88  00  M- 
0(8s/n)  approximation  factors  to  SIS,  when  q  >  8  • 
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Theorem  3.8.  Let  m,n  be  integers,  S  =  {z  E  Zm  \  {0}  |  ||z||  <  /3  A  IjzHoo  <  /?oo}/°r  real 
P  >  Poo  >  0,  and  q  >  P  ■  rvlt' 1 1  be  an  integer  modulus  with  at  most  poly(n)  integer  divisors  less  than  Poo- 
For  some  7  =  max{l,  PPoo/q}  ■  0(Py/n),  there  is  an  efficient  reduction  from  SIVP!)  (and  hence  also  from 
standard  SIVPTU,n)  on  n-dimensional  lattices  to  S-collision  finding  for  SIS(m,  n,  q )  with  non-negligible 
advantage. 


Proof.  Given  an  input  basis  B  of  a  lattice  A,  we  can  apply  the  LLL  algorithm  to  obtain  a  2n-approximation 
to  r)( A),  and  by  scaling  we  can  assume  that  77  ( A)  e  [1,  2"].  For  i  =  1 .... .  n,  we  run  the  procedure  described 
below  for  each  hypothesized  upper  bound  r/i  =  2*  on  77(A).  Each  call  to  the  procedure  either  fails,  or  returns 
a  set  of  linearly  independent  vectors  in  A  whose  lengths  are  all  bounded  by  (7/2)  ■  r/?;.  We  return  the  first 
such  obtained  set  (i.e.,  for  the  minimal  value  of  i).  As  we  show  below,  as  long  as  r/,  >  77(A)  the  procedure 
returns  a  set  of  vectors  with  overwhelming  probability.  Since  one  77 \  6  [1,  2)  •  77(A),  our  reduction  solves 
SIVP!)  with  overwhelming  probability,  as  claimed. 

The  procedure  invokes  the  reduction  from  Lemma  3.7  with  77  =  77,  to  obtain  samples  with  parameters 
in  the  range  [V2P00,  V^PPoo]  ■  7-  On  these  samples  we  run  the  procedure  from  Lemma  3.5  with  R  = 
nnix{\/2<y.  \f2PPoo}  to  obtain  samples  having  parameters  in  the  range  [R,  \J 2 R]  ■  77.  On  such  samples  we 
repeatedly  run  (using  independent  samples  each  time)  the  reduction  from  Lemma [T6|  After  enough  runs,  we 
obtain  with  overwhelming  probability  a  set  of  linearly  independent  lattice  vectors  all  having  lengths  at  most 
(7/2)  •  77,  as  required.  □ 


4  Hardness  of  LWE  with  Small  Uniform  Errors 

In  this  section  we  prove  the  hardness  of  inverting  the  LWE  function  even  when  the  error  vectors  have  very 
small  entries,  provided  the  number  of  samples  is  sufficiently  small.  We  proceed  similarly  to  Ii23ll4l.  by  using 
the  LWE  assumption  (for  discrete  Gaussian  error)  to  construct  a  lossy  family  of  functions  with  respect  to 
a  uniform  distribution  over  small  inputs.  However,  the  parameterization  we  obtain  is  different  from  those 
in  |[23l  [4||.  allowing  us  to  obtain  pseudorandomness  of  LWE  under  very  small  (e.g.,  binary)  inputs,  for  a 
number  of  LWE  samples  that  exceeds  the  LWE  dimension. 

Our  results  and  proofs  are  more  naturally  formulated  using  the  SIS  function  family.  So,  we  will  first 
study  the  problem  in  terms  of  SIS,  and  then  reformulate  the  results  in  terms  of  LWE  using  Proposition [2T9] 
We  recall  that  the  main  difference  between  this  section  and  Section [3]  is  that  here  we  consider  parameters 
for  which  the  resulting  functions  are  essentially  injective,  or  more  formally,  statistically  second-preimage 
resistant.  The  following  lemma  gives  sufficient  conditions  that  ensure  this  property. 

Lemma  4.1.  For  any  integers  m,  k,  q,  s  and  set  X  C  [s]m,  the  function  family  SIS(m,  k,  q)  is  (statistically) 
e-second  preimage  resistant  with  respect  to  the  uniform  input  distribution  U{X)  for  e  =  X  •  ( s'/q)k , 
where  s'  is  the  largest  factor  of  q  smaller  than  s. 

Proof.  Let  x  -t—  U{X)  and  A  SIS (m,  k.  q)  be  chosen  at  random.  We  want  to  evaluate  the  probability 
that  there  exists  an  x'  £  X  \  {x}  such  that  Ax  =  Ax'  (mod  q),  or,  equivalently,  A(x  —  x')  =  0 
(mod  q).  Lix  any  two  distinct  vectors  x,  x'  e  X  and  let  z  =  x  —  x'.  The  vector  Az  (mod  q)  is  distributed 
uniformly  at  random  in  (dZ/qZ)fc,  where  d  =  gcd(q,  z\. . . . ,  zm)-  All  coordinates  of  z  are  in  the  range 
Zi  €  {— (s  —  1), . . . ,  (s  —  1)},  and  at  least  one  of  them  is  nonzero.  Therefore,  d  is  at  most  s'  and  |dZ^|  = 
( q/d)k  >  ( q/s')k .  By  union  bound  (over  x'  e  X  \  {x})  for  any  x,  the  probability  that  there  is  a  second 
preimage  x'  is  at  most  (|X|  —  l)(s' /q)k.  □ 
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We  remark  that,  as  shown  in  Section  [3j  even  for  parameter  settings  that  do  not  fall  within  the  range 
specified  in  Lemma  [4~I|  SIS (m,k,q)  is  collision  resistant,  and  therefore  also  (computationally)  second- 
preimage-resistant.  This  is  all  that  is  needed  in  the  rest  of  this  section.  However,  when  SIS(m,  k,  q )  is  not 
statistically  second-preimage  resistant,  the  one-wayness  proof  that  follows  (see  Theorem [43])  is  not  very 
interesting:  typically,  in  such  settings,  SIS(m,  k.  q )  is  also  statistically  uninvertible,  and  the  one-wayness 
of  SIS(m.  k,  q)  directly  follows  from  Lcmma[2.2|  So,  below  we  focus  on  parameter  settings  covered  by 
Lemma  [4. 1 1 

We  prove  the  one-wayness  of  T  =  SIS(m.  q.X)  with  respect  to  the  uniform  input  distribution 
X  =  U(X)  by  building  a  lossy  function  family  (C.  T .  X)  where  C  is  an  auxiliary  function  family  that  we 
will  prove  to  be  uninvertible  and  computationally  indistinguishable  from  T .  The  auxiliary  family  C  is  derived 
from  the  following  function  family. 

Definition  4.2.  For  any  probability  distribution  y  over  TEr  and  integer  m  >  £,  let  Xirn  .  t.  y)  be  the  prob¬ 
ability  distribution  over  linear  functions  [I  |  Y] :  Zm  —y  Zf  where  I  is  the  £  x  £  identity  matrix,  and 
Y  G  i  ]  is  obtained  choosing  each  column  of  Y  independently  at  random  from  y. 

We  anticipate  that  we  will  set  y  to  the  Gaussian  input  distribution  f  =  a  in  order  to  make  C 
indistinguishable  from  T  under  a  standard  LWE  assumption.  But  for  generality,  we  prove  some  of  our  results 
with  respect  to  a  generic  distribution  y. 

The  following  lemma  shows  that  for  a  bounded  distribution  y  (and  appropriate  parameters),  Zirn,  £,  y) 
is  (statistically)  uninvertible. 

Lemma  4.3.  Let  y  be  a  probability  distribution  on  [T]  C  {—a, . . . ,  cr}n,  and  let  X  C  {— s, . . . ,  s}m.  Then 
Z(m,  £,  Y)  is  e-uninvertible  with  respect  to  U (X)for  e  =  (1  +  2s(l  +  a(m  —  £))Y /\ X\. 

Proof  Let  /  =  [I  |  Y]  be  an  arbitrary  function  in  the  support  of  Z(rri.  £.  y).  We  know  that  \yi,j\  <  cr  for  all 
i,j.  We  first  bound  the  size  of  the  image  \  f(X)\.  By  the  triangle  inequality,  all  the  points  in  the  image  f(X) 
have  lx  norm  at  most 


||/(u)||00  <  ||u||oo(l  +  <r{m  -  £))  <  s(l  +  <r(m  -  £)). 

The  number  of  integer  vectors  (in  Z^)  with  such  bounded  norm  is 

(1  +  2s(l  +  <j(m  -  £)))e. 

Dividing  by  the  size  of  X  and  using  Lcmma[2~4|  the  claim  follows.  □ 

Lemma |L3] applies  to  any  distribution  f  with  bounded  support.  When  f  =  D|  a  is  a  discrete  Gaussian 
distribution,  a  slightly  better  bound  can  be  obtained.  (See  also  0 ,  which  proves  a  similar  lemma  for  a 
different,  non-uniform  input  distribution  X.) 

Lemma  4.4.  Let  y  =  Z)$  be  the  discrete  Gaussian  distribution  with  parameter  a  >  0,  and  let  X  C 
{— s, . . . ,  s}m.  Then  Z(m,  £,  y)  is  e-uninvertible  with  respect  to  U (X),  for  e  =  0{ams/ y/JY / 1  Y|  + 

Proof  Again,  by  Lemma  [274]  it  is  enough  to  bound  the  expected  size  of  f(X)  when  /  <—  Z(m,£,y)  is 
chosen  at  random.  Remember  that  /  =  [I  |  Y]  where  Y  <—  lYi  Y"  ^  ■  Since  the  entries  of  Y  G  M^x  Zn-() 
are  independent  mena-zero  subgaussians  with  parameter  a,  by  a  standard  bound  from  the  theory  of  random 
matrices,  the  largest  singular  value  si  (Y)  =  maxo^xsR  m  1 1 Y x 1 1 / 1 1 x 1 1  of  Y  is  at  most  a-0(V£-\-s/m  —  £)  = 
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a  •  0{y/rn),  except  with  probability  2  .  We  now  bound  the  norm  of  all  vectors  in  the  image  f(X). 

Let  u  =  (m,  U2)  £  X,  with  ui  £  and  u2  £  Zm_^.  Then 

ll/(u)||  < 

< 

< 


< 


The  number  of  integer  points  in  the  ^-dimensional  zero-centered  ball  of  radius  R  =  O(ams)  can  be  bounded 
by  a  simple  volume  argument,  as  |/(X)|  <  [R  +  yfi/ 2)nV(  =  0(ams/V£)£,  where  Vg  =  7r£/2/(^/2)!  is  the 
volume  of  the  ^-dimensional  unit  ball.  Dividing  by  the  size  of  X  and  accounting  for  the  rare  event  that  si(Y) 
is  not  bounded  as  above,  we  get  that  Z(m,  £,  y)  is  e-uninvertible  for  e  =  0(ams/y/J)£  / \X\  +  fj$ 


||  U!  +  Yu2|| 

|| U!  ||  +  ||  Yu2  II 

(^/i  +  si(Y)\/m  —  s 

[y/l  +  a  ■  0{-\/m)y/m  —  l)  s 
O(ams). 


We  can  now  prove  the  one-wayness  of  the  SIS  function  family  by  defining  and  analyzing  an  appropriate 
lossy  function  family.  The  parameters  below  are  set  up  to  expose  the  connection  with  LWE,  via  Proposi¬ 
tion  2.9  SIS(m,  m  —  n,q)  corresponds  to  LWE  in  n  dimensions  (given  rn  samples),  whose  one-wayness 


we  are  proving,  while  SIS(f?  =  m  —  n  +  k,m  —  n,  q)  corresponds  to  LWE  in  k  <  n  dimensions,  whose 
pseudorandomness  we  are  assuming. 


Theorem  4.5.  Let  q  be  a  modulus  and  let  X,  T  be  two  distributions  over  Zm  and  TX  respectively,  where 
l  =  m  —  n  +  k  for  some  0  <  k  <  n  <  m,  such  that 


1.  Z(m,  £,  y)  is  uninvertible  with  respect  to  input  distribution  X, 

2.  SlSff,  m  —  n,q)  is  pseudorandom  with  respect  to  input  distribution  y,  and 

3.  SIS(m,  m  —  n,  q)  is  second-preimage  resistant  with  respect  to  input  distribution  X. 

Then  IF  =  SIS(m,  m  —  n,q)  is  one-way  with  respect  to  input  distribution  X. 

In  particular,  if  SIS(£,  m  —  n,  q )  is  pseudorandom  with  respect  to  the  discrete  Gaussian  distribution 
y  =  D g  ,  then  SIS(m,  m  —  n,q)  is  (2e  +  2^^^) -one -way  with  respect  to  the  uniform  input  distribution 
X  =U(X)  over  any  set  X  C  {— s, . . . ,  s}m  satisfying 

(1 C'ams/VIf/e  <  \X\  <  e  •  ( q/s')m~n , 


where  s'  is  the  largest  divisor  ofq  that  is  smaller  than  or  equal  to  2s,  and  C'  is  the  universal  constant  hidden 
by  the  ()(■)  notation  from  Lemma  \4l4\ 


Proof  We  will  prove  that  (£,  F,X)  is  a  lossy  function  family,  where  F  =  SIS(m,  rn  —  n.  q)  and  C  = 
SIS(^,  m  —  n,  q)  o  I(rn,  £,  T)-  It  follows  from  Lemma  2.3  that  both  F  and  C  are  one-way  function  families 
with  respect  to  input  distribution  X.  Notice  that  F  is  second-preimage  resistant  with  respect  to  X  by 
assumption.  The  indistinguishability  of  C  and  F  follows  immediately  from  the  pseudorandomness  of 
SIS(£,  m  —  to,  q)  with  respect  to  3^,  by  a  standard  hybrid  argument.  So,  in  order  to  prove  that  (£,  F,  X)  is 
a  lossy  function  family,  it  suffices  to  prove  that  C  is  uninvertible  with  respect  to  X .  This  follows  applying 
Lemma  2.5  to  the  function  family  Z(m,  £ ,  X),  which  is  uninvertible  by  assumption.  This  proves  the  first  part 
of  the  theorem. 
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Now  consider  the  particular  instantiation.  Let  X  =  li(X)  be  the  uniform  distribution  over  a  set 
X  C  {— s, . . . ,  s}m  whose  size  satisfies  the  inequalities  in  the  theorem  statement,  and  let  y  =  I)i  r7. 
Since  \X\ (s' /q)m~n  <  e,  by  Lemma |4~Ij  SIS(m,  m  —  n,q)  is  (statistically)  second-preimage  resistant  with 

I(m,£,  3^)  is 

□ 


respect  to  input  distribution  X.  Moreover,  since  (Cams /y/JY /\X \  <  e,  by  Lemma 
(e  +  2~n(m))-uninvertible  with  respect  to  input  distribution  X. 


4.4 


In  order  to  conclude  that  the  LWE  function  is  pseudorandom  (under  worst-case  lattice  assumptions)  for 
uniformly  random  small  errors,  we  combine  Theorem  4.5  with  Corollary  |2. 14  instantiating  the  parameters 
appropriately.  For  simplicity,  we  focus  on  the  important  case  of  a  prime  modulus  q.  Nearly  identical  results 
for  composite  moduli  (e.g.,  those  divisible  by  only  small  primes)  are  also  easily  obtained  from  Corollary  |2. 14[ 
or  by  using  either  Theorem  2.13  or  Theorem|2.12| 


Theorem  4.6.  Let  0  <  k  <  n  <  m  —  cc(log  k )  <  k°^\  £  =  m  —  n  +  k,  s  >  (CmY^n~k^  for  a  large 
enough  universal  constant  C,  and  q  be  a  prime  such  that  max{3\/ k.  (4 s)m/(m_ri)}  <  q  <  k°^l\  For 
any  set  X  C  {— s, . . . ,  s}m  of  size  |X|  >  sm,  the  SIS(m,  m  —  n,q)  (equivalently,  LWE(m,  n,  q))  function 
family  is  one-way  ( and  pseudorandom)  with  respect  to  the  uniform  input  distribution  X  =  U(X),  under  the 
assumption  that  SIVP7  is  (quantum)  hard  to  approximate,  in  the  worst  case,  on  k-dimensional  lattices  to 
within  a  factor  7  =  0(yfk  ■  q). 

A  few  notable  instantiations  are  as  follows.  To  obtain  pseudorandomness  for  binary  errors,  we  need  s  =  2 
and  X  =  {0,  l}m.  For  this  value  of  s,  the  condition  s  >  {CmY^n  k )  can  be  equivalently  be  rewritten  as 


m  <  (n  —  k)  ■  1  + 


1 

log  2  (Cm) 

which  can  be  satisfied  by  taking  k  =  n/(C'  log2  n)  and  m  =  n(  1  +  l/(clog2  n))  for  any  desired  c  >  1  and  a 
sufficiently  large  constant  C’  >  1/(1  —  1  /c).  For  these  values,  the  modulus  should  satisfy  q  >  8m/(m~n)  = 
8 n3c  =  k°(l  \  and  can  be  set  to  any  sufficiently  large  prime  p  = 

Notice  that  for  binary  errors,  both  the  worst-case  lattice  dimension  k  and  the  number  m  —  n  of  “extra” 
LWE  samples  (i.e.,  the  number  of  samples  beyond  the  LWE  dimension  n)  are  both  sublinear  in  the  EWE 
dimension  n:  we  have  k  =  0(n/logn)  and  m  —  n  =  0(n/  log  n).  This  corresponds  to  both  a  stronger 
worst-case  security  assumption,  and  a  less  useful  LWE  problem.  By  using  larger  errors,  say,  bounded  by 
s  =  ne  for  some  constant  e  >  0,  it  is  possible  to  make  both  the  worst-case  lattice  dimension  k  and  number 
of  extra  samples  m  —  n  into  (small)  linear  functions  of  the  LWE  dimension  n,  which  may  be  sufficient  for 
some  cryptographic  applications  of  LWE.  Specifically,  for  any  constant  e  <  1,  one  may  set  k  =  (e/3 )n  and 
m  =  (1  +  e/3 )n,  which  are  easily  verified  to  satisfy  all  the  hypotheses  of  Theorem  4.6  when  q  =  k° W 


is  sufficiently  large.  These  parameters  correspond  to  (e/3 )n  =  Ll(n)  extra  samples  (beyond  the  LWE 
dimension  n),  and  to  the  worst-case  hardness  of  lattice  problems  in  dimension  (e/3 )n  =  Q (n) .  Notice  that 
for  e  <  1/2,  this  version  of  LWE  has  much  smaller  errors  than  allowed  by  previous  LWE  hardness  proofs, 
and  it  would  be  subject  to  subexponential-time  attacks  0  if  the  number  of  samples  were  not  restricted.  Our 
result  shows  that  if  the  number  of  samples  is  limited  to  (1  +  e/3)n,  then  LWE  maintains  its  provable  security 
properties  and  conjectured  exponential-time  hardness  in  the  dimension  n. 

One  last  instantiation  allows  for  a  linear  number  of  samples  m  =  c  •  n  for  any  desired  constant  c  >  1, 
which  is  enough  for  most  applications  of  LWE  in  lattice  cryptography.  In  this  case  we  can  choose  (say) 


'Here  we  have  not  tried  to  optimize  the  value  of  q,  and  smaller  values  of  the  modulus  are  certainly  possible:  a  close  inspection  of 


the  proof  of  Theorem  4.6  reveals  that  for  binary  errors,  the  condition  q  >  8 n3c  can  be  replaced  by  q  >  nc  for  any  constant  d  >  c. 
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k  =  re/2,  and  it  suffices  to  set  the  other  parameters  so  that 

s  >  {Cm)2c~l  and  q  >  •  (C,cn)2c+1+1/(c_1)  =  £:0(1). 


(We  can  also  obtain  better  lower  bounds  on  s  and  q  by  letting  k  be  a  smaller  constant  fraction  of  n.)  This 
proves  the  hardness  of  LWE  with  uniform  noise  of  polynomial  magnitude  s  =  n()l  1  '> ,  and  any  linear  number 
of  samples  m  =  0{n).  Note  that  for  m  =  cn,  any  instantiation  of  the  parameters  requires  the  magnitude  s 
of  the  emors  to  be  at  least  nc_1.  For  c  >  3/2,  this  is  more  noise  than  is  typically  used  in  the  standard  LWE 
problem,  which  allows  errors  of  magnitude  as  small  as  0(y/n),  but  requires  them  to  be  independent  and 
follow  a  Gaussian-like  distribution.  The  novelty  in  this  last  instantiation  of  Theorem|4.6|is  that  it  allows  for  a 
much  wider  class  of  error  distributions,  including  the  uniform  distribution,  and  distributions  where  different 
components  of  the  error  vector  are  correlated. 


Proof  of  Theorem  \4.6\  We  prove  the  one-wayness  of  SIS(m,  m  —  n.  q)  (equivalently,  LWE  (m,n,q)  via 


Proposition  2.9 )  using  the  second  part  of  Theorem  4.5  with  a  =  3 y/k.  Using  £  >  k  and  the  primality  of 


the  conditions  on  the  size  of  X  in  Theorem  |4. 5 1  can  be  replaced  by  simpler  bounds 


(3  CmsY 


<  \X\  <  e  ■  qr 


or  equivalently,  the  requirement  that  the  quantities  (3C'msY /\X\  and  \X\/qm~n  are  negligible  in  k.  For  the 
first  quantity,  letting  C  =  4C"  and  using  |X|  >  sm  and  s  >  (4 C'mY^n~k\  we  get  that  (3C'msY/\X\  < 
(3/4)^£  <  (3/4)_fc  is  exponentially  small  (in  k).  For  the  second  quantity,  using  X  <  (2s  +  1)"'  and 
q  >  (4s)m/(m_”),  we  get  that  \X\/qm~n  <  (3/4)m  is  also  exponentially  small. 

Theorem [43] also  requires  the  pseudorandomness  of  SIS((.  m  —  n,  q)  with  respect  to  the  discrete  Gaussian 
input  distribution  y  =  D|  rj,  which  can  be  based  on  the  (quantum)  worst-case  hardness  of  SIVP  on  k- 
dimensional  lattices  using  Corollary  2.14  (Notice  the  use  of  different  parameters:  SIS(m,  rri  —  n,  q)  in 
Corollary  2.14  and  SIS(m  —  n  +  k,  m  —  n,  q)  here.)  After  properly  renaming  the  variables,  and  using 


a  =  3 Vk,  the  hypotheses  of  Corollary  2.14  become  w(log  k)  <  m  —  n  <  3 Vk  <  q  <  k°^\  which 


are  all  satisfied  by  the  hypotheses  of  the  Theorem.  The  corresponding  assumption  is  the  worst-case  hardness 
of  SIVP7  on  fc-dimensional  lattices,  for  7  =  kojrq/o  =  s/kujkq/3  =  O(Vkq),  as  claimed.  This  concludes 
the  proof  of  the  one-wayness  of  LWE. 

The  pseudorandomness  of  LWE  follows  from  the  sample-preserving  search-to-decision  reduction  of 

fT7l.  □ 
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Accelerating  Computations  on  Encrypted  Data  with  an  FPGA1 

By  David  Bruce  Cousins  and  Kurt  Rohloff,  Raytheon  BBN  Technologies 

Healthcare,  financial,  government,  and  military  organizations  depend  on  encryption  to  secure  sensitive  data.  Historically,  this  data  has 
had  to  be  decrypted  before  it  could  be  processed  or  analyzed.  As  a  result,  data  processing  had  to  be  performed  on  secured  hardware, 
eliminating  the  possibility  of  using  the  cloud  or  other  low-cost,  third-party  computing  resources. 

At  the  Symposium  on  the  Theory  of  Computing  in  2009,  Craig  Gentry  of  IBM  presented  a  fully  homomorphic  encryption  (FHE)  scheme 
that  made  it  possible  to  send  sensitive  data  to  an  unsecured  server,  process  it  there,  and  receive  an  encrypted  result,  without  ever 
decrypting  the  original  data.  While  FHE  was  a  major  theoretical  breakthrough,  actual  FHE  implementations  are  many  orders  of 
magnitude  too  slow  to  be  of  practical  use,  particularly  for  large  encryption  keys  and  ciphertexts. 

As  a  step  toward  a  practical  FHE  implementation,  we  have  developed  a  somewhat  homomorphic  encryption  (SHE)  scheme  that,  with 
modifications,  can  be  converted  into  an  FHE  scheme.  Current  FHE  implementations  depend  on  complicated  operations  that  are 
inefficient  when  performed  on  a  CPU,  and  our  goal  was  to  take  advantage  of  the  parallelism  and  pipelining  of  FPGAs  using  MATLAB  , 
Simulink  ,  and  HDL  Coder”.  Homomorphic  encryption  is  an  active  area  of  study,  and  new  advances  are  being  made  regularly.  By  using 
MATLAB  and  Simulink  instead  of  a  lower-level  programming  language,  we  can  keep  pace  with  these  developments  by  rapidly 
implementing  improvements  to  the  algorithms. 

Homomorphic  Encryption  Basics 

FHE  enables  secure  and  private  computation  using  encrypted  data.  In  theory,  computations  can  be  carried  out  using  just  two 
FHE  operations:  EvalAdd  and  EvalMult.  These  operations  are  similar  to  binary  XNOR  and  AND  operations,  but  they  operate  on 
encrypted  bits,  and  their  result  remains  encrypted.  Current  FHE  schemes  are  based  on  computationally  intensive  stochastic 
lattice  theory  problems.  Stochastics  introduce  noise  into  each  EvalAdd  or  EvalMult  operation.  The  amount  of  noise  increases 
rapidly  with  the  number  of  operations  performed.  FHE  schemes  address  this  buildup  of  noise  by  periodically  running  a 
bootstrapping  algorithm  on  the  intermediate  results.  One  problem  with  this  approach  is  that  the  bootstrapping  algorithm  is 
computationally  expensive.  A  second  is  that  current  FHE  schemes  entail  modular  arithmetic  with  a  large  modulus.  The 
operations  required  are  memory-intensive  and  inefficient  when  performed  on  standard  CPUs. 

SHE  schemes  avoid  the  need  for  bootstrapping  by  limiting  the  number  of  EvalAdd  and  EvalMult  operations  that  must  be 
performed  to  keep  noise  below  an  acceptable  threshold.  SHE  schemes  can  be  augmented  with  bootstrapping  operations  to 
produce  FHE  schemes,  provided  that  the  additional  number  of  operations  for  bootstrapping  itself  keeps  the  noise  below  this 
threshold. 


1  Sponsored  by  the  Defense  Advanced  Research  Projects  Agency  (DARPA)  and  the  Air  Force  Research  Laboratory  (AFRL)  under 
Contract  No.  FA8750-1  l-C-0098.  The  views  expressed  are  those  of  the  authors  and  do  not  necessarily  reflect  the  official  policy  or 
position  of  the  Department  of  Defense  or  the  U.S.  Government.  Distribution  Statement  "A"  (Approved  for  Public  Release,  Distribution 
Unlimited.) 
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Implementing  Modulo  Add  and  Multiply  in  MATLAB  and  Simulink 

FHE  schemes  are  based  on  EvalAdd  and  EvalMult,  elementwise  vector  addition  and  multiplication  operations  performed  in  modulo  q 
arithmetic  where  q  is  a  large  prime  integer.  In  MATLAB,  EvalAdd  is  expressed  simply  as 

c  =  mod(a+b,  q) 

and  EvalMult  is  expressed  as 

c  =  mod  (a  .  *b,  q) 

These  straightforward  representations  make  it  easy  to  perform  calculations  on  a  CPU,  but  to  exploit  the  parallelism  of  FPGAs  we  needed 
to  develop  efficient  software  implementations  of  EvalAdd  and  EvalMult  in  VHDL  .  We  began  by  prototyping  algorithms  in 
MATLAB — for  example,  the  EvalAdd  operation  with  inputs  bounded  by  the  modulus  q: 

c=a+b; 

cgteq  =  (c>=q) ; 
c (cgteq) =c (cgteq) -q; 

Our  initial  MATLAB  representation  served  as  a  reference  for  the  Simulink  model,  which  would  be  used  for  hardware  implementation 
and  F1DL  code  generation  (Figure  1).  With  the  initial  MATLAB  representation,  we  were  able  to  explore  several  theoretical  approaches 
because  MATLAB  handled  large  complicated  vector  and  matrix  operations  quickly  and  naturally.  Additionally,  we  were  able  to  use 
Fixed-Point  Designer”  to  generate  bit-accurate  fixed-point  solutions,  including  modeling  rollover  in  our  adders,  with  only  minor 
modifications  to  our  systems. 


0*  v,ft=r  - 


Figure  1 .  Simulink  model  of  the  modulo  add  operation. 


Once  we  had  converged  on  a  sound  approach,  a  detailed  Simulink  model  was  created  for  HDL  implementation.  With  Simulink,  we  were 
able  to  lay  out  the  logical  components  in  the  design  and  automatically  generate  optimized  HDL  code.  We  were  also  able  to  add 
capabilities,  including  supporting  multiple  moduli  and  optimizing  our  models  to  use  table  lookups,  in  a  controlled,  incremental  way. 

The  model  can  process  one  pair  of  inputs  on  each  clock  cycle. 

To  verify  the  model,  we  compared  the  results  it  produced  with  the  results  from  our  MATLAB  code. 

Modulo  multiplication  is  much  more  complicated  than  modulo  addition.  To  manage  this  complexity  and  enable  more  efficient 
pipelining  in  the  generated  HDL,  we  have  developed  a  multiple-word  modulo  multiply  operation  based  on  the  Barrett  reduction 
algorithm.  Figure  2  shows  the  structure  of  a  pipelined,  four-stage  multiple-word  multiplier,  where  each  stage  operates  on  a  subword  of 
the  large  overall  word. 
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Figure  2.  Simulink  model  of  the  64-bit  Barrett  modulo  multiply  operation,  showing  four  stages. 


Figure  3  shows  the  detailed  model  of  a  single  stage,  also  pipelined  for  efficient  HDL  generation.  Here  again,  the  flexibility  of  Simulink 
enables  us  to  specify  the  actual  word  widths  at  run  time.  The  same  models  are  used  for  generating  48-bit  as  well  as  64-bit  arithmetic 
HDL. 
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Figure  3.  Simulink  model  of  a  single  stage  of  the  Barrett  modulo  multiply  operation. 


Implementing  the  Chinese  Remainder  Transform 

Our  SHE  scheme  uses  the  Chinese  remainder  transform  (CRT)  to  simplify  the  structure  of  our  add  and  multiply  operations.  CRT 
resembles  the  discrete  Fourier  transform  (DFT),  but  uses  modular  integer  arithmetic  instead  of  complex  arithmetic.  We  implemented  the 
CRT  as  an  EvalMult  operation  followed  by  a  fast  numeric  transform  (like  a  fast  Fourier  transform  [FFT]  but  with  modulo  arithmetic). 

To  create  the  Simulink  model  for  the  FFT  shown  in  Figure  3,  we  started  with  a  standard  streaming  FFT,  reordered  the  inputs,  and 
converted  from  complex  arithmetic  to  modulo  arithmetic  using  integer  fixed-point  arithmetic  (Figures  4a,  4b,  and  4c). 
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Figure  4a.  Streaming  fast  numeric  transform  modeled  in  Simulink. 
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Figure  4b.  Detail  of  a  single  stage  of  the  fast  numeric  transform  showing  the  butterfly  and  shuffle  models. 


Figure  4c.  Detail  of  the  modulo  Butterfly  operation  of  the  fast  numeric  transform. 


We  were  able  to  develop  a  streaming  model  that  consisted  of  two  basic  blocks  that  are  parameterized  at  run  time.  By  cascading  log2(N) 
blocks  together  we  can  assemble  FFTs  of  various  power-of-2  sizes.  This  approach  allows  us  to  process  two  input  samples  every  clock 
cycle.  We  have  implemented  and  benchmarked  up  to  a  2A14  size  CRT  on  a  Xilinx  Virtex  -7  in  this  manner. 

Moving  from  Floating  Point  to  Fixed  Point  and  from  Model  to  VHDL 

When  we  started  prototyping  in  MATLAB  we  used  floating-point  math,  which  provided  a  quick,  easy  way  to  understand  the 
computations  that  we  needed  to  support.  We  then  used  Fixed-Point  Designer  to  transition  from  floating-point  to  fixed-point  arithmetic 
with  varying  integer  bit-widths. 

Our  MATLAB  code  and  Simulink  models  use  the  same  fixed-point  variables  and  produce  output  in  the  same  format,  simplifying 
verification  of  test  results.  When  running  simulations,  we  can  specify  the  bit-width  of  the  input  data.  The  intermediate  mathematical 
operations  are  automatically  sized  by  Fixed-Point  Designer,  enabling  us  to  use  same  models  for  inputs  of  varying  bit-widths. 

The  transition  to  fixed-point  math  is  a  required  step  for  VHDL  implementation.  We  generated  the  required  VHDL  from  our  Simulink 
models  using  HDL  Coder.  We  simulated  the  VHDL  using  Mentor  Graphics  ModelSim  ,  and  synthesized  the  code  on  a  Xilinx  Virtex 
FPGA.  The  generated  VHDL  can  be  used  on  FPGAs  from  different  vendors,  enabling  us  to  benchmark  across  multiple  platforms.  We 
also  use  HDL  Verifier”  to  validate  and  demonstrate  the  generated  VHDL  running  on  a  Xilinx  ML  605  evaluation  board. 

Accelerating  Development 

The  combination  of  MATLAB,  Simulink,  Fixed-Point  Designer,  HDL  Coder,  and  HDL  Verifier  enables  us  to  develop,  implement,  and 
improve  our  encryption  scheme  much  faster  than  would  be  possible  with  traditional  methods.  Speed  is  essential  to  our  efforts  because 
FHE  theory  is  evolving  so  rapidly.  Several  times  a  year,  innovations  come  to  light  that  require  a  rewrite  of  our  code  and  subsequent 
changes  to  our  model.  We  estimate  that  development  would  take  two  to  three  times  longer  if  we  were  working  in  C,  Python,  or  another 
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low-level  language.  Without  Simulink  and  HDL  Coder,  production  of  VHDL  code  would  be  similarly  slowed  because  our  team  does  not 
have  experience  writing  VHDL  from  scratch. 


As  we  continue  to  increase  the  capabilities  of  our  SHE  scheme  and  improve  its  performance,  rapid  prototyping  in  MATLAB  and 
Simulink  and  automatic  VHDL  code  generation  with  HDL  Coder  remain  central  to  our  development  efforts. 
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Abstract —  One  of  the  goals  of  the  DARPA  PROCEED  pro¬ 
gram  has  been  accelerating  the  development  of  a  practical  Fully 
Homomorphic  Encryption  (FHE)  scheme.  For  the  past  three 
years,  this  program  has  succeeded  in  accelerating  various  aspects 
of  the  FHE  concept  toward  practical  implementation  and  use. 
FHE  is  a  game-changing  technology  to  enable  secure,  general 
computation  on  encrypted  data  on  untrusted  off-site  hardware, 
without  the  data  ever  being  decrypted  for  processing.  FHE 
schemes  developed  under  PROCEED  have  achieved  multiple 
orders  of  magnitude  improvement  in  computation,  but  further 
means  of  acceleration,  such  as  implementations  on  specialized 
hardware,  such  as  an  FPGA  can  improve  the  speed  of  computa¬ 
tion  even  further. 

The  current  interest  in  FHE  computation  resulted  from 
breakthroughs  demonstrating  the  existence  of  FHE  schemes  [1, 
2]  that  allowed  arbitrary  computation  on  encrypted  data.  Specif¬ 
ically,  our  contribution  to  the  Proceed  program  has  been  the 
development  of  FPGA  based  hardware  primitives  to  accelerate 
the  computation  on  encrypted  data  using  an  FHE  cryptosystem 
based  on  NTRU-like  lattice  techniques  [3]  with  additional  with 
additional  support  for  efficient  key  switching  and  modulus  re¬ 
duction  operations  to  reduce  the  frequency  of  bootstrapping  op¬ 
erations  [4],  Cipher  texts  in  our  scheme  are  represented  as  rec¬ 
tangular  matrices  of  64-bit  integers.  This  bounding  of  the  oper¬ 
and  sizes  has  allowed  us  to  take  advantage  of  modern  code  gen¬ 
eration  tools  developed  by  Mathworks  to  implement  VHDE  code 
for  FPGA  circuits  directly  from  Simulink  models.  Furthermore 
the  implicit  parallelism  of  the  scheme  allows  for  large  amounts  of 
pipelining  in  the  implementation  in  order  to  achieve  efficient 
throughput.  The  resulting  VHDE  is  integrated  into  an  AXI4  bus 
“Soft  System  on  Chip”  using  Xilinx  platform  studio  and  a  Mi¬ 
croblaze  soft  core  processor  running  on  aVirtex7  VC707  evalua¬ 
tion  board.  This  report  presents  new  Simulink  primitives  that 
had  to  be  developed  to  deal  with  these  new  requirements. 

Keywords — Fully  Homomorphic  Encryption:  Co-processor; 
SIMULINK;  FPGA 

1 .  Introduction  -  A  Quick  Review  of  Fully-  and 
Somewhat-  Homomorphic  Encryption 

Our  team  recently  published  our  work  to  design,  implement 
and  evaluate  a  scalable  FHE  scheme  which  addresses  the 
limitations  for  secure  arbitrary  computation  [4].  Our 
implementation  uses  a  variation  of  a  not  previously 
implemented  bootstrapping  scheme  [5]  simplified  for  power- 
of-2  rings.  We  also  use  a  “double-Chinese  Remainder 
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Transform  (CRT)”  representation  of  cipher  texts  which  is 
discussed  in  [6].  With  this  double-CRT  representation,  we  can 
select  parameters  so  that  cipher  texts  are  secure  when 
represented  as  matrices  of  64  bit  integers,  but  still  support  the 
secure  execution  of  programs  on  commodity  computing 
devices  without  expending  unnecessary  computational 
overhead  manipulating  large  multi-hundred-bit  or  even  multi- 
thousand-bit  integers.  Additionally,  the  parallelism  implicit  in 
this  data  representation  is  easily  exploited  to  achieve 
efficiencies  during  implementation. 

Our  implementation  encrypts  a  plaintext  bit  into  a  two  di¬ 
mension  array  of  64  bit  unsigned  integers1.  We  use  a  residue 
number  system  implementation  to  represent  cipher  texts  as  T 
sets  of  lenglli-.V  integer  vectors.  A  ring  in  the  tower  entry  /  has 
a  unique  modulus  q,  which  bounds  all  entries  in  that  ring.  The  n 
dimension  is  known  as  the  ring  size,  and  the  t  dimension  as  the 
tower  size.  This  representation  allows  us  to  operate  in  parallel 
on  the  smaller  bit  width  modulo  q,  values  instead  of  on  a  single 
modulus  q  of  much  larger  bit  width,  where  q  =  q,  *  q2*.  qT 
for  pairwise  co-prime  moduli  q ,.. 

As  outlined  in  [4],  our  implementation  requires  only  a  few 
elementary  operations  to  be  implemented  on  the  FPGA  hard¬ 
ware  in  order  to  achieve  large  run  time  speedups  over  conven¬ 
tional  CPU  implementations.  These  operations  are: 

•  RingAdd:  c,u  =  (an  t  +  b,J  %  q, 

•  RingSub:  clU  =  (anJ  - £„.<)  %  q, 

•  RingMul:  c„,t  =  (a,lt  *  b„  )  %  q, 

All  three  of  the  above  operations  can  be  parallelized  or 
pipelined  over  both  n  and  t .  Also  required  are  the 

•  CRT  and  Inverse  CRT,  which  are  implemented  as  a 
Number  Theoretic  Transform  [7]  coupled  with  a  pre-  or  post- 
RingMul  with  an  appropriate  Twiddle  Vector. 

•  Round:  A  function  to  perform  modulo  rounding  using 
different  tower  moduli  (detailed  below). 

In  our  cryptosystem,  two  key  operations  are  defined: 
EvalAdd  and  EvalMult.  When  our  parameters  are  chosen  such 
that  a  single  plaintext  bit  is  encrypted,  the  resulting  operations 
on  the  encrypted  data  are  XOR  and  AND  respectively.  These 


While  the  actual  number  of  bits  is  determined  by  the  parameter  selection  of 
the  cryptosystem,  we  select  64  as  our  maximum  dimension  for  FPGA  imple¬ 
mentation. 
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two  operations  allow  us  to  implement  any  Boolean  operation  of 
input  cipher  text 2. 

Our  crypto-system,  like  many  FHE  systems,  is  random 
(noisy)  in  nature.  Because  of  this,  only  a  limited  number  of 
operations  can  be  performed  on  the  encrypted  data  before  the 
noise  dominates  and  decryption  is  no  longer  guaranteed. 
EvalAdd  does  not  add  noise  to  the  system,  so  an  unlimited 
number  of  such  operations  are  allowed  to  be  chained  together. 
EvalMult  however  does  add  noise,  and  this  limits  the  number 
of  such  operations  that  can  be  chained  together.  The  double 
CRT  representation  allows  a  very  straightforward 
implementation  that  controls  this  noise.  This  requires  the  use  of 
both  key  switching  and  modulus  reduction  whenever  an 
EvalMult  is  performed.  The  combination  of  these  three  steps  is 
known  as  a  Composed  EvalMult  (CEM).  The  property  of  CEM 
is  that  for  a  pair  of  inputs  of  a  given  tower  size  t,  the  output  is  a 
cipher  text  of  tower  size  t-1.  Thus  for  an  initial  tower  size  of  T. 
at  most  (T-1)  CEM  operations  can  be  performed  before  the 
noise  in  the  cypher-text  grows  beyond  the  point  where  it  can  be 
reliably  decrypted.  An  implementation  that  has  a  limit  to  the 
allowable  number  of  Homomorphic  operations  is  called 
Somewhat  Homomorphic. 

The  dimensions  of  the  cryptosystem  are  determined 
algorithmically,  and  are  a  function  of  security  required  and  the 
number  of  CEM  operations  required  to  implement  the  desired 
application.  If  the  number  of  operations  required  by  the 
application  exceeds  0(16),  then  a  Bootstrapping  operation  will 
be  required  to  reset  the  noise  generated  by  the  cryptographic 
operations.  Bootstrapping  is  currently  on  the  order  of  10  CEM 
equivalent  operations  for  reasonable  security  parameters. 
Bootstrapping  has  the  property  of  taking  a  cipher  text  of  tower 
size  t,  and  generating  a  new  ‘refreshed’  cipher  text  of  the 
system’s  original  tower  size  T.  Thus  an  unlimited  number  of 
operations  can  be  performed  on  the  data.  This  kind  of 
implementation  is  called  Fully  Homomorphic.  The  remainder 
of  our  paper  will  discuss  the  current  FPGA  implementation  of 
the  functions  required  for  Somewhat  Homomorphic  operation. 
Our  planned  implementation  of  the  functions  needed  for  Fully 
Homomorphic  operation  will  be  implemented  in  our  final 
phase  of  the  program  this  year. 

II.  VHDL  Implementations  of  Fast  Modulus  Arithmetic 

and  Chinese  Remainder  Transforms  (CRT)  Using 
Simulink-Based  Models 

A.  Optimisations  and  Refinements  To  Previous 

Implementations 

We  have  previously  reported  on  our  Simulink-based 
implementations  of  the  three  modulus  arithmetic  functions,  as 
well  as  the  forward  and  inverse  CRT  functions[8,  9].  Our 
current  work  has  updated  these  implementations  to  allow 
VHDL  code  generation  with  a  doubling  of  circuit  clock  speeds 
to  200  MHz.  This  was  done  by  performing  the  following 
optimizations. 


2Any  arbitrary  Boolean  function  can  be  constructed  from  NAND  operations. 
Since  NOT(a)  ==  XOR(a,  1),  and  NAND(a,  b)  ==  NOT(AND(a,  b)),  the  two 
Homomorphic  operations  are  a  sufficient  set. 


Mathworks  determined  that  by  selecting  synchronous  vs. 
asynchronous  reset  in  the  Simulink  to  HDL  generation 
parameters,  the  resulting  VHDL  mapped  more  efficiently  into 
the  registers  built  into  the  DSP48E  blocks  on  the  Vertex  7 
FPGA,  increasing  the  efficiency  of  the  resulting  mapped 
VHDL  by  eliminating  extra  routing  traces. 

The  previous  circuits  were  designed  to  run  at  a  minimum 
speed  of  100  MHz.  We  determined  that  adding  explicit 
pipelining  stages  in  the  form  of  delay  lines  to  the  model 
enabled  the  Xilinx  tools  to  better  optimize  FPGA  mapping 
during  place  and  route  pipelining  stages.  Specifically  pipelines 
were  added  between  arithmetic  operations  within  the  RingAdd 
(4  stages),  RingSub  (3  stages),  RingMul  (188  stages)  models. 
Since  our  target  ring  size  can  be  as  large  as  214,  and  all  the 
towers  of  a  variable  are  processed  sequentially,  the  delay 
incurred  from  filling  the  pipeline  is  expected  to  be  minimal. 

Once  the  models  were  maximally  pipelined,  we  identified 
several  large  (64  by  64  bit)  product  blocks  within  our  RingMul 
Barret  multiplication  implementation  [9,  10]  as  being  the 
slowest  components,  and  re-implemented  them  as  an  expanded 
multiplication  model  consisting  of  four  parallel  32  by  32  bit 
products,  and  a  pipelined  accumulation  of  partial  sums.  This 
further  increased  the  achievable  clock  speeds.  We  discovered 
that  adding  additional  pipelines  of  length  four,  both  before  and 
after  each  resulting  smaller  product  block  further  allowed  the 
Xilinx  optimizer  to  break  these  product  blocks  into  multiple 
DSP48E  multipliers  in  a  distributed  fashion.  This  allowed  the 
RingMul  circuit  to  perform  at  speeds  in  excess  of  350  MHz, 
well  in  excess  of  our  target  200  MHz. 

Several  of  our  circuits  utilize  lookup  tables,  both  for  storing 
the  moduli  q,  and  for  storing  various  twiddle  table  entries  for 
the  CRT  and  inverse  CRT.  Our  previous  direct  implementation 
of  the  table  lookup  using  the  Simulink  Lookup  function  block 
maps  the  resulting  ROM  directly  into  gate  circuitry.  This  can 
increase  the  place  and  route  drastically  for  very  large  tables, 
and  also  can  result  in  less  efficient  circuits.  Mathworks 
determined  that  by  placing  an  additional  delay  line,  with  a 
"ResetType  =  none"  HDL  block  property  let  the  Xilinx 
tools  map  the  table  to  block  ram  in  the  FPGA,  which  is  a  more 
efficient  utilization  of  resources  on  the  chip. 

B.  FPGA  Hardware  Selection 

Our  FPGA  selection  was  driven  by  the  need  for  a  large 
number  of  hardware  multipliers  on  the  chip.  Due  to  cost 
constraints  we  wanted  to  use  a  commercial  off-the-shelf  FPGA 
board  for  our  experiments.  Our  selection  of  the  Virtex  7 
VC707  evaluation  board  was  driven  by  the  following  sizing 
requirements.  Our  target  ring  size  of  214  requires  1110  DSP48 
blocks  for  the  CRT  and  the  same  number  for  the  inverse  CRT. 
The  VC707  has  a  Virtex  7  485T  chip  which  contains  2800 
such  blocks,  more  than  sufficient  to  implement  our  projected 
set  of  FHE  primitives.  Additionally,  we  require  on-board  DDR 
memory  for  storage  of  encrypted  variables,  and  high  speed 
Ethernet  and  PCI  interfaces  to  exchange  data  with  the  host 
computer.  All  these  are  present  on  the  VC707. 
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C.  FPGA  System  Architecture 

The  design  goal  of  our  FPGA  system  was  to  be  able  to 
operate  as  an  attached  processor  to  accelerate  the  FHE 
primitive  operations  in  way  that  allows  one  to  chain  together 
several  operations  in  order  to  minimize  the  overhead  due  to 
data  transfer.  An  attached  processor  design  was  developed  in 
which  a  software  programmable  microcontroller  would 
manage  I/O  communications  with  the  host  via  Ethernet  or  PCI 
memory  map,  manage  on  board  data  storage  in  the  form  of  an 
encrypted  register  fde,  and  manage  data  transfer  to  and  from 
the  FHE  primitive  modules  in  as  efficient  manner  as  possible. 

We  decided  to  use  the  Xilinx  Platform  Studio  Microblaze 
soft  core  processor  and  AXI4  interconnect  architecture  to 
implement  the  attached  FHE  processor.  Fig.  1  shows  a  system 
block  diagram  of  the  resulting  system.  The  Xilinx  platform 
studio  enables  us  to  implement  our  FHE  primitives  as 
streaming  co-processors  on  the  AXI  bus.  An  AXI4  lite  bus  is 
used  to  set  control  parameters  of  our  Ring  operation  circuits, 
such  as  ring  size,  and  tower  size. 

The  main  AXI4  interconnect  is  a  256  bit  bus  connecting  the 
DDR  ram  with  the  various  FHE  primitives.  The  I/O  rate  into 
and  out  of  DDR  memory  limits  the  overall  processing  speed  of 
the  system.  Our  RingAdd,  RingSub  and  RingMul  primitives 
each  require  two  input  streams  and  one  output  streams.  Fig.  2 
shows  how  we  currently  integrate  our  FHE  primitives  with  the 
AXI4  stream  interconnect.  Each  of  these  three  operations  is 
parallelized  across  ring  elements  as  well  as  tower  indices. 
These  data  streams  are  implemented  using  a  pair  of  AXI4 
DMA  controllers,  each  handling  one  input  and  one  output. 
Data  is  clocked  in  and  out  of  the  bus  at  400  MHz,  and  streamed 
via  individual  AXIS  buses  between  the  DMAs  to  the  AXI 
stream  blocks  where  they  are  buffered  with  FIFOs  and  split 
into  eight  parallel  64  bit  input  data  streams,  and  four  64  bit 
output  data  streams.  Current  implementations  of  these  three 
functions  are  clocked  at  100  MHz,  so  four  parallel 
instantiations  of  each  operation  are  used  to  keep  the  I/O 
pipelines  full.  Future  implementations  of  these  primitives  are 
planned  to  be  clocked  at  200  MHz,  and  as  such  will  only  sup¬ 
port  two  instantiations  in  parallel. 


Figure  1:  System  block  diagram  showing  major  com¬ 
ponents  and  the  AXI4  interconnect. 


The  forward  and  inverse  CRT  modules  require  slightly 
different  interfaces.  CRT  operations  are  parallelizable  across 
tower  entry  but  not  across  ring  index.  Thus  CRT’s  cannot  be 
parallelized  in  the  same  way  as  the  Ring  operations.  Currently 
we  have  a  single  CRT  or  inverse  CRT  at  a  time  operating  at 
100  MHz.  Future  implementations  will  run  these  two 
operations  at  200  MHz,  but  the  multiplier  resources  required 
for  the  planned  ring  size  of  2 14  will  prohibit  mapping  more  than 
one  forward  and  one  inverse  CRT  onto  the  485T  chip. 

D.  Microblaze  Software  Architecture 

The  Xilinx  platform  studio  is  used  to  implement  a  Micro¬ 
blaze  soft  core  processor.  The  system  architecture  is  based  on 
the  demo  hardware  self-test  example  that  is  provided  with  the 
Xilinx  board.  The  software  architecture  is  based  on  the  web- 
service  example  provided  with  the  Xilinx  Virtex  6  ML605 
evaluation  kit,  updated  with  the  Xilinx  SGMII  144  Ethernet 
controller.  The  software  controlling  the  system  on  the  Micro¬ 
blaze  is  written  in  C  code.  The  PC  “host”  end  of  the  software 
interface  is  also  written  in  C.  The  host  interface  currently  is 
implemented  in  two  versions.  The  first  is  a  stand-alone  test 
bench  that  can  test  and  exercise  the  operation  of  the  attached 
FHE  processor.  The  second  version  interfaces  with  Matlab  via 
a  file  interchange  mechanism  to  support  demonstration  Ho¬ 
momorphic  Encryption  application  programs.  The  interfaces 
use  either  Ethernet  or  PCI  bus  I/O  based  on  compile  flags. 

The  system  software  is  multithreaded  to  allow  the  use  of 
Ethernet  TCP/IP  socket  I/O.  A  network  thread  manages  socket 
level  EO  between  the  host  and  the  attached  processor.  Another 
thread  reads  the  incoming  messages  from  the  socket,  parses  the 
commands  received  and  dispatches  execution  to  various  sub¬ 
routines.  The  PCI  interface  is  written  to  emulate  the  buffer  I/O 
of  the  Ethernet  interface,  allowing  the  same  software  to  be  used 
for  both  Ethernet  and  PCI  operation. 

The  DDR3  ram  is  partitioned  into  a  set  of  register  data 
structures,  as  well  as  a  set  of  internal  registers  to  store  con¬ 
stants  used  in  our  encryption  schemes.  Each  register  can  hold 
one  encrypted  bit  in  the  form  of  a  two  dimensional  vector  of 


Figure  2:  Integration  of  FHE  primitives  with  the  AXI 
stream  data  streams. 


Approved  for  Public  Release;  Distribution  Unlimited. 
356 


unsigned  long  longs  that  are  allocated  out  of  DDR  ram.  One 
dimension  (the  fastest  index)  is  the  ring  size  N  and  is  a 
compiled  constant.  The  other  dimension,  the  tower  size,  varies 
with  the  state  of  the  register.  Typically  registers  are  loaded 
into  the  FHE  coprocessor  with  a  fixed  starting  number  of  the 
tower  elements  (up  to  MAXTOWERSIZE  =  32  elements). 
We  eliminate  the  highest  tower  entries  one  by  one  as  each 
CEM  operation  is  performed. 

The  registers  are  allocated  out  of  heap  in  the  DDR  ram. 
There  are  three  flavors  of  registers:  Input,  Output  and  Scratch. 
This  design  decision  was  made  in  order  to  allow  us  to  later 
segregate  EO  and  scratch  registers  into  different  memory 
locations  if  that  were  to  increase  throughput  (allowing  simulta¬ 
neous  host  access  to  the  I/O  registers  while  the  FGPA  was  pro¬ 
cessing  with  the  Scratch  registers.  The  quantity  of  each  register 
type  is  software  defined  at  compile  time  but  there  is  usually  a 
small  numbers  of  Input  and  Output  registers  and  as  many 
Scratch  registers  as  will  utilize  all  the  available  heap  space. 
Control  structures  mark  the  current  tower  size  of  each  register, 
and  if  the  register  is  used  or  not.  Registers  are  allocated  so  they 
are  aligned  to  32  byte  address  boundaries  in  order  to  allow  the 
AXI4  DMA  engines  to  move  the  register  data  into  and  out  of 
the  FHE  primitives.  This  format  allows  the  contents  of  an 
entire  register  (all  used  towers)  to  be  streamed  with  only  one 
DMA  transfer. 

The  communication  protocol  between  the  PC  host  and  the 
FPGA  board  is  message  based.  The  messages  are  in  ASCII. 
Messages  can  span  multiple  socket  buffers;  with  multiple 
socket  calls  made  until  enough  text  has  been  parsed  to 
complete  a  message  (double  cr/lf  indicates  the  end  of  a 
message).  Each  message  can  contain  several  instructions  to  the 
processor,  separated  by  cr/lf).  Each  processor  instruction  is 
then  parsed.  The  parsing  test  starts  with  a  keyword  that  defines 
the  rest  of  the  instruction  format.  The  keywords  are  shown  in 
Table  1.  The  system’s  assembly  language  has  the  syntax 
shown  in  Table  2. 


TABLE  I.  Control  Protocol  Keywords 


Key 

word 

Function 

LOAD 

Transfer  the  contents  of  the  message  (ASCII)  into  a 
particular  Input  register. 

GET 

Request  the  contents  of  a  particular  output  register  to  be 
loaded  into  an  ASCII  message  buffer  and  sent  back  to  the  host. 

STATUS 

Generates  a  short  report  on  the  FPGA  board  console  for 
debugging  showing  the  contents  of  all  used  registers,  a  listing 
of  the  current  program  loaded. 

PROG 

Loads  a  sequence  of  operations  to  be  performed  on  the 
register  data,  in  a  simple  assembly  language. 

RUN 

Starts  a  software  Finite  State  Machine  to  run  the  stored 
program  to  completion. 

CRT, 

ICRT, 

CEM 

A  single  command  that  will  LOAD  two  registers,  perform 
a  forward  CRT,  inverse  CRT  or  Composed  EvalMult  on  them 
and  GET  the  resulting  output.  Used  for  accelerating 
applications  that  only  require  these  three  operations. 

RESET 

Resets  the  system  to  its  original  state. 

TABLE  II.  Avaliable  Opcodes  for  Homomorphic  Encrypted 
Programs 


Opcode 

Example 

Description 

LOAD 

R1  =  LOAD(lnO) 

Moves  data  from  an  input  register 
to  scratch  register,  all  active  tower 
elements  are  moved. 

STORE 

Out4  =  STORE(R3) 

Moves  data  from  a  scratch  register 
to  output  register,  all  active  tower 
elements  are  moved. 

RADD 

R2  =  RADD(R3,  R4) 

Sets  up  DMAs  of  the  two  input  and 
one  output  registers  to  the  RingAdd 
circuit.  All  active  tower  elements  are 
processed  I  one  large  data  flow. 

RSUB 

R2  =  RSUB(R3,  R4) 

Same  as  RingAdd,  except  the 
RingSub  circuit  is  the  target/source  of 
the  I/O  DMAs. 

RMUL 

R2  =  RMUL(R3,  R4) 

Same  as  RingAdd,  except  the 
RingMul  circuit  is  the  target/source  of 
the  I/O  DMAs. 

CRT 

R3=  CRT(R1,R2) 

Same  as  RingAdd,  except  the  input 
and  output  registers  are  used  as 
endpoints  for  pairs  of  DMA  transfers, 
each  moving  one  half  of  the  ring  data. 
Note  second  input  register  is  used  as  a 
scratch  register  so  is  contents  are 
destroyed. 

ICRT 

R2  =  ICRT(R4,  R5) 

Same  as  CRT  except  an  inverse 
CRT  circuit  is  used. 

EMULC 

R2  =  EMULC(R3,  R4) 

Executes  a  ComposedEvalMult,  in 
software  which  in  turns  executes  several 
Ring  primitives  (see  below).  Note  that 
output  register  is  one  tower  smaller  than 
the  input  registers. 

An  example  simple  program  in  now  given  in  Table  3.  The 
program  first  moves  encrypted  data  from  input  register  0,  to 
scratch  register  0,  then  repeats  the  process  for  a  second  input 
variable  to  register  1.  It  then  computes  a  RingAdd,  RingSub 
and  RingMul  using  the  two  inputs,  and  storing  the  result  in 
scratch  registers  2,  3  and  4  respectively.  It  then  stores  those 
three  results  in  output  registers  0,  1  and  2  respectively. 

Typical  system  operation  would  be  for  the  user  to  execute 
two  LOAD  commands  to  load  the  contents  of  input  registers  0 
and  1  with  encrypted  data  (the  encryption  being  done  on  the 
secure  host).  The  user  then  executes  a  RUN  command  to  allow 
the  Homomorphic  operations  to  be  run  on  the  unsecure  FPGA 
processor.  Then  subsequent  calls  to  GET  commands  will 


TABLE  III.  Sample  program 

R0  =  LOAD (InO) 

R1  =  LOAD (Ini) 

R2  =  RADD (R0 , R1 ) 

R3  =  RSUB (R0,R1) 

R4  =  RMUL (R0 , R1 ) 

OutO  =  STORE (R2) 

Outl  =  STORE (R3) 

Out2  =  STORE (R4) 
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transfer  the  resulting  encrypted  result  data  back  to  the  host. 
Finally  decryption  would  be  done  on  the  secure  host. 

E.  Microcode  Implementation  of  ComposedEvalMult 

As  mentioned  above,  one  of  the  new  functions 
implemented  in  our  system  is  the  ComposedEvalMult  (CEM) 
which  is  fully  detailed  in  [4].  This  function  is  implemented  in 
our  software  controller  as  a  series  of  C  function  calls,  all  but 
one  of  which  are  executed  with  previously  existing  primitives. 
First,  a  RingMul  operation  performs  the  multiply.  Next  a  key- 
switch  operation  is  performed  consisting  of  another  RingMul 
of  the  product  with  a  hint  variable  defined  by  the  cryptosystem. 
Then,  a  modulus  reduction  operation  is  performed  on  the  single 
highest  tower  entry  of  the  result  which  consists  of  an  inverse 
CRT  and  a  new  Rounding  operation. 

This  Rounding  operation  is  implemented  as  a  new 
hardware  function  because  it  contains  operations  not  available 
in  the  other  ring  functions.  Fig.  3  shows  the  Simulink  Model 
consisting  of  a  modified  EvalMult  operation  (using  a  modified 
set  of  moduli  qi),  and  a  pair  of  operations  selected  by  the  range 
of  the  result  which  ensure  the  output  is  bounded  within  an 
appropriate  range.  The  operations  are  performed  in  a  pipelined 
manner  as  well,  to  allow  execution  at  200  MFlz. 

The  result  of  the  rounding  operation  is  a  pair  of  new  ring 
vectors  that  are  then  in  turn  applied  to  each  remaining  tower 
entry  to  reduce  the  noise  accumulated  by  the  initial  product. 
These  vectors  are  first  processed  with  a  series  of  RingAdds, 
RingSubs  and  a  CRT  using  each  of  the  corresponding  ring 
moduli.  The  end  result  is  that  the  highest  tower  ring  is 
eliminated  from  the  cipher  text,  and  the  overall  noise  of  the 
system  remains  at  a  usable  (i.e.  de-cryptable)  level. 

III.  Current  Results  and  Next  Steps 

Our  presentation  will  include  I/O  timing,  run-time  and  chip 
utilization  details  of  our  attached  processor  performing  the 
suite  of  ring  primitives  on  various  ring  sizes,  based  on  the 
implementation  in  our  Virtex  7  VC707  evaluation  board. 


Future  plans  for  our  FPGA  system  include  adding  all  Ring 
primitives  that  will  be  required  to  accelerate  the  Bootstrapping 
operation  described  in  [4].  The  CRT  and  inverse  CRT 
operations  will  be  modified  to  allow  the  Number  Theoretic 
Transform  (NTT)  portion  [7]  to  be  combined  into  one  circuit, 
saving  a  large  amount  of  FPGA  multiplier  resources. 
Additionally,  multiple  ring  sizes  will  be  supported  by 
modifications  to  the  NTT  to  support  multiple  power-of-two 
ring  sizes.  This  will  allow  us  to  support  the  Ring  Reduction 
operation  in  [4]  for  increased  computational  efficiency.  Note 
that  all  of  the  other  primitives  can  at  arbitrary  ring  sizes.  The 
final  target  ring  size  is  214,  which  will  support  relatively  secure 
computation. 

Acknowledgment 

We  would  like  to  acknowledge  Christopher  Peikert  for  all 
his  numerous  invaluable  contributions  to  the  theoretical  and 
practical  implementation  aspects  of  this  project. 

References 

[1]  C.  Gentry  and  S.  Halevi.  Implementing  Gentry’s  Fully-Homomorphic 
encryption  scheme.  In  Kenneth  Paterson,  editor,  Advances  in 
Cryptology  -  EUROCRYPT  2011,  volume  6632  of  Lecture  Notes  in 
Computer  Science,  chapter  9,  pages  129-148.  Springer,  2011. 

[2]  D.  Micciancio.  A  first  glimpse  of  cryptography's  Holy  Grail.  Comm. 
ACM  53,  3  (March  2010),  96-96. 

[3]  V.  Lyubashevsky,  C.  Peikert,  and  O.  Regev.  “On  ideal  lattices  and 
learning  with  errors  over  rings”.  In  Henri  Gilbert,  editor,  Advances  in 
Cryptology  -  EUROCRYPT  2010,  volume  6110  of  Lecture  Notes  in 
Computer  Science,  chapter  1,  pages  1-23.  Springer  Berlin  /  Heidelberg, 
Berlin. 

[4]  K.  Rohloff,  D.  B.  Cousins,  “A  Scalable  Implementation  of  Fully 
Homomorphic  Encryption  Built  on  NTRU.”  2nd  Work-shop  on  Applied 
Homomorphic  Cryptography  and  Encrypted  Computing  (WAHC).  Mar. 
7,2014. 

[5]  J.  Alperin-Sheriff  and  C.  Peikert.  “Practical  bootstrapping  in  quasilinear 
time”.  In  Ran  Canetti  and  JuanA.  Garay,  editors,  Advances  in 
Cryptology  CRYPTO  2013,  volume  8042  of  Lecture  Notes  in  Computer 
Science,  pages  1-20.  Springer  Berlin  Heidelberg,  2013. 

[6]  C.  Gentry,  S.  Halevi,  and  N.  Smart.  “Homomorphic  evaluation  of  the 
AES  circuit.”  In  Reihaneh  Safavi-Naini  and  Ran  Canetti,  editors, 
Advances  in  Cryptology  CRYPTO  2012,  volume  7417  of  Lecture  Notes 


Approved  for  Public  Release;  Distribution  Unlimited. 

358 


in  Computer  Science,  pages  850-867.  Springer  Berlin  /  Heidelberg, 
2012. 

[7]  H.  Cohen  A  Course  in  Computational  Algebraic  Number  Theory.  New 
York:  Springer- Verlag,  1993. 

[8]  D.  Cousins,  K.  Rohloff,  C.  Peikert,  R.  Schantz  “Scalable 
Implementation  of  Primitives  for  Homomorphic  EncRyption  -  FPGA 
implementation  using  Simulink”  20 1 1  High  Perfor-mance  Extreme 
Computing  Workshop  Sep  21-22  2011,  Lex-ington  MA 


[9]  D.  Cousins,  K.  Rohloff,  C.  Peikert,  R.  Schantz  “An  Update  on  SIPHER 
(Scalable  Implementation  of  Primitives  for  Ho-momorphic  EncRyption) 
-  FPGA  implementation  using  Simulink”  2012  IEEE  Conference  on 
High  Performance  Ex-treme  Computing  (HPEC)  Sep  10-12  2012, 
Waltham  MA 

[10]  M.  Knezevic,  F.  Vercauteren,  and  I.  Verbauwhede,  “Faster  Interleaved 
Modular  Multiplication  Based  on  Barrett  and  Montgomery  Reduction 
Methods”,  IEEE  Transactions  on  Computers,  Vol.  59,  No.  12,  Dec  2010 


Approved  for  Public  Release;  Distribution  Unlimited. 
359 
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Computing  with  Data  Privacy: 


Steps  toward  Realization 


David  W.  Archer  |  Galois 

Kurt  Rohloff  |  New  Jersey  Institute  of  Technology 


Two  new  cryptographic  methods — linear  secret  sharing  (LSS)  and  fully  homomorphic  encryption 
(FHE) — allow  computing  on  sensitive  data  without  decrypting  it.  LSS  and  FHE  differ  in  speed,  ease  of  use, 
computational  primitives,  and  cost. 


22 


Users  often  don’t  trust  computing  environments 
such  as  shared  clouds  to  perform  computation  on 
sensitive  data.  Only  recently  has  it  become  possible  to 
address  this  trust  concern  with  general-purpose  compu¬ 
tation  on  encrypted  data.  In  this  article,  we  discuss  two 
forms  of  such  computation:  linear  secret  sharing  (LSS)1 
and  fully  homomorphic  encryption  (FHE).2 

In  LSS,  a  user  or  group  of  users,  each  with  pri¬ 
vate  data,  encrypts  the  data  and  sends  it  to  a  group  of 
untrusted  servers.  These  servers  share  the  computation 
without  decrypting  the  data  and  return  still-encrypted 
results.  In  FHE,  a  user  encrypts  data  and  sends  it  to  a 
single  untrusted  server,  which  computes  an  encrypted 
answer  and  returns  it  to  the  user. 

Computation  time  for  both  approaches  is  many 
orders  of  magnitude  slower  than  computation  “in  the 
clear.”  In  addition,  LSS  requires  multiple  servers  to  per¬ 
form  computation  and  significant  communication  band¬ 
width  among  them.  FHE  typically  imposes  significant 
expansion  in  ciphertext  size  relative  to  plaintext,  which 
affects  both  memory  utilization  and  network  bandwidth. 

We  created  prototypes  including  LSS-  and  homomor¬ 
phic  encryption  (HE) -based  variations  of  voice-over-IP 
(VoIP)  teleconferencing  systems  using  Amazon  Elastic 
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Cloud  nodes  to  mix  encrypted  voice  streams  from  iPhone 
handsets,  an  LSS -based  email  guard  using  regular  expres¬ 
sion  search  to  determine  which  messages  to  transmit,  and 
an  FHE-based  email  guard  using  string  comparison  to  fil¬ 
ter  email. 

Protocols,  Adversary  Models, 
and  Security  Guarantees 

Here,  we  describe  our  secure  computation  systems 
as  well  as  applicable  adversary  models  and  security 
guarantees. 

Linear  Secret  Sharing 

In  LSS,  multiple  proxies  collaboratively  compute  a 
function  on  behalf  of  one  or  more  clients.1  Each  client 
distributes  to  each  proxy  a  share  of  its  secret  input.  Each 
share  is  essentially  random — a  fixed  linear  function  of 
the  secret  input  and  random  values  selected  by  the  cli¬ 
ent.  Thus,  no  proxies  learn  anything  about  the  input. 
LSS  works  because  its  systems  exhibit  homomorphisms 
to  mathematical  structures  of  interest  such  as  the  inte¬ 
gers,  allowing  parties  holding  shares  to  compute  func¬ 
tions  of  secrets  by  arithmetically  manipulating  only 
their  shares  of  those  secrets. 

1 540-7993/1 5/$31. 00  ©  2015  IEEE 
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Return  result  shares 


Figure  1.  Our  linear  secret  sharing  (LSS)  protocol.  Each  client  encrypts  its  input  with  three  cipher  streams,  producing  a 
metashare  (step  1).  Each  client  transmits  its  metashare  to  the  coordination  server  (not  shown)  over  a  secure  channel, 
which  in  turn  distributes  these  metashares  to  the  three  proxies  (step  2).  Each  proxy  computes  its  share  from  each 
metashare  by  decrypting  the  metashare  using  one  cipher  stream  (that  it  and  the  client  providing  the  metashare  both 
know),  and  then  performs  the  desired  computation,  communicating  with  other  proxies  as  needed  over  secure  channels 
(step  3).  Each  proxy  encrypts  its  result  share  using  a  distinct  cipher  stream  it  shares  with  the  client,  and  then  sends  it  to  the 
coordination  server,  which  computes  the  XOR  of  all  result  shares  into  a  result  metashare  and  forwards  this  to  clients  (step 
4).  Each  client  decrypts  the  metashare  to  obtain  the  computation  result  (step  5). 


As  a  simple  example,  suppose  clients  Alice  and  Bob 
agree  to  add  secret  inputs  X  and  Y  that  they  respectively 
hold.  Assume  X  and  Y  are  in  [0  ...  2n  -  l]  for  natural 
number  n.  Alice  computes  three  shares  ofXby  choosing 
random  X1  and  X2  from  [0  ...  2”  -  l],  and  then  choos¬ 
ing  X3  such  that  X  =  (Xj  +  X2  +  X3 )  mod  2”.  Alice  then 
distributes  these  three  shares  to  the  proxies  over  secure 
channels,  such  that  each  proxy  holds  one  distinct  share. 
Bob  does  the  same  for  Y.  The  proxies  add  their  shares, 
resulting  in  each  proxy  holding  one  of  X2  +  Y1,X2  +  Y?, 
or  X3  +  Y3.  Note  that  none  of  these  result  shares  reveal 
anything  about  X  +  T  to  the  proxies  that  hold  them.  The 
proxies  send  these  result  shares  to  Alice  or  Bob  over 
secure  channels.  Bob  or  Alice  then  adds  them  together 
to  obtain  X+Y. 

While  communication  from  clients  to  proxies  in  typ¬ 
ical  LSS  systems  is  direct,  we  found  that  having  mobile 
clients  distribute  shares  directly  to  each  proxy  resulted 
in  substantial  loading  of  client  Wi-Fi  channels.  To 
address  such  Wi-Fi  overload,  we  extended  our  applica¬ 
tions’  communication  model  to  introduce  an  untrusted 
coordination  server.  Clients  cryptographically  combine 
all  three  shares  they  compute  into  a  single  metashare 
that’s  sent  to  the  coordination  server.  This  server,  which 
we  locate  in  a  richer  bandwidth  environment  along  with 
the  proxies,  distributes  the  metashare  to  all  three  prox¬ 
ies,  which  compute  their  own  shares  from  the  metashare 
and  preshared  key  material. 

The  core  of  our  LSS  system,  ShareMonad,  consists  of 
a  Haskell-embedded  (www.haskell.org)  domain-specific 
language  (DSL)  for  expressing  LSS  computation,  a 


compiler  to  transform  ShareMonad  code  into  abstract 
syntax  trees  suitable  for  interpretation,  and  a  three-proxy 
LSS  interpreter.  Each  proxy  in  a  ShareMonad  application 
runs  this  interpreter.  Clients  and  coordination  servers 
run  application  code  that  interoperates  with  the  proxy 
code.  Thus,  each  of  our  LSS  applications  consists  of  a 
composition  of  code  running  on  clients,  coordination 
server  code,  and  ShareMonad  code  running  on  proxies. 

Our  LSS  DSL  provides  operations  including  addi¬ 
tion,  subtraction,  multiplication,  unsigned  division,  com¬ 
parisons,  bitwise  shift  right,  conversion  between  [0  ... 
2n  -  l]  and  bit  vector  representations,  table  lookups,  and 
operations  on  bit  vectors.  ShareMonad  protocols  cur¬ 
rently  assume  an  honest  but  curious  adversary:  proxies 
are  assumed  to  compute  and  communicate  as  agreed  but 
might  observe  attached  channels  and  local  computations. 

As  Figure  1  shows,  our  LSS  protocols  typically  pro¬ 
ceed  in  several  steps: 

1.  Each  client  encrypts  its  input  with  three  cipher 
streams,  producing  a  metashare. 

2.  Each  client  transmits  its  metashare  to  the  coordi¬ 
nation  server  (not  shown)  over  a  secure  channel, 
which  in  turn  distributes  these  metashares  to  the 
three  proxies. 

3.  Each  proxy  computes  its  share  from  each  metashare 
by  decrypting  the  metashare  using  one  cipher  stream 
(that  it  and  the  client  providing  the  metashare  both 
know),  and  then  performs  the  desired  computa¬ 
tion,  communicating  with  other  proxies  as  needed 
over  secure  channels. 
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4.  Each  proxy  encrypts  its  result  share  using  a  distinct 
cipher  stream  it  shares  with  the  client,  and  then 
sends  it  to  the  coordination  server,  which  computes 
the  XOR  of  all  result  shares  into  a  result  metashare 
and  forwards  this  to  clients. 

5.  Each  client  decrypts  the  metashare  to  obtain  the 
computation  result. 

We  compute  metashares  and  shares  as  follows.  We 
distribute  in  advance  a  cryptographic  key  between 
each  client  CL  and  each  proxy  A,  B,  and  C.  This  key 
seeds  stream  ciphers  used  to  form  metashares  from 
input  data.  To  compute  the  metashare  Xm  of  secret  X, 
a  stream  cipher  is  used  to  generate  a  random  value  RA 
that  undergoes  bitwise  XOR  with  X.  CL  repeats  this 
process  with  the  cryptographic  key  it  shares  with  B  and 
C,  obtaining  Xm  =  X  XOR  RA  XOR  RB  XOR  Rc,  which 
it  sends  to  the  coordination  server  to  be  forwarded  to 
all  three  proxies.  A  uses  the  key  it  shares  with  CL  to 
compute  Ra,  which  it  uses  to  compute  its  share  ofX,  X1 
=  Xm  XORXa  =  XXORRb  XOR Rc.  B  and  C  similarly 
compute  their  shares  X2  andX^,  respectively.  Note  that 
X  can  trivially  be  recovered  from  these  shares:  X  =  X1 

xorx2xorx3. 

Once  shares  are  computed,  computation  of  the 
desired  function  proceeds  on  the  proxies.  In  the  case  of 
addition,  no  communication  among  proxies  is  required: 
A  computes  result  share  Rj  =  Xj  +  Yp  B  computes  R2  = 
X2  +  Y2,  and  C  computes  R3  =  X3  +  Yy  Once  computa¬ 
tion  is  complete,  A,  B,  and  C  send  R p  R2,  and  R3,  respec¬ 
tively,  to  the  coordination  server,  which  computes  the 
values’  bitwise  XOR,  and  forwards  this  single  meta¬ 
result  to  the  clients  for  final  decryption. 

Note  that  naively  following  this  return  transmission 
protocol  would  reveal  all  shares  of  the  computation 
result  to  the  untrusted  coordination  server.  We  avoid 
this  security  lapse  by  having  A,  B,  and  C  encrypt  Rp  R2, 
and  R3,  respectively,  using  keys  shared  between  A,  B,  C, 
and  the  client  to  enable  decryption  by  the  client. 

Some  computations,  such  as  X  x  Y,  require  communi¬ 
cation  among  proxies.  XxY=  (Xj  +  X2  +  X3)  x  ( Y1  +  Y2  + 
Y3)  involves  not  only  locally  computable  terms  such  as  Xj 
x  YJ  but  also  terms  such  as  X2  x  Yy  These  terms  require 
that  each  proxy  communicate  its  share  to  one  other 
proxy.  We  follow  the  method  that  Dan  Bogdanov  and  his 
colleagues  described:  sharing  among  proxies  occurs  in 
symmetric  rounds,  and  each  proxy  adds  new  entropy  to 
its  share  before  sending  that  share  to  a  neighbor.5  Thus, 
even  though  proxies  communicate  their  shares  to  other 
proxies,  the  communicated  values  don’t  allow  those  prox¬ 
ies  to  gain  any  knowledge  of  the  original  secret. 

We  ensure  privacy  in  each  portion  of  our  protocol 
in  Figure  1,  except  those  that  execute  on  the  trusted 
client  platforms: 


■  Passphrase  sharing  prior  to  computation  is  handled 
by  well-known  asymmetric  cryptographic  (public- 
key  infrastructure)  protocols.  The  Advanced  Encryp¬ 
tion  Standard  (AES)  and  the  National  Institute  of 
Standards  and  Technology  SP  800-90  standard  pro¬ 
vide  cryptographically  secure  random  numbers  for 
creating  shares. 

■  Transmission  of  metashares  Xm  from  client  to 
coordination  server  and  onward  to  proxies  is  pro¬ 
tected  by  the  entropy  added  during  creation  ofXm. 

■  Local  computation  on  the  proxies  is  protected  from 
observation  because  it’s  performed  only  on  crypto¬ 
graphic  shares.  We  prevent  accumulation  of  too  many 
shares  on  a  single  proxy  by  introducing  additional 
entropy  during  the  sharing  process,  as  we  described. 

■  Transmission  from  proxies  to  the  coordination  server 
is  protected  by  encryption  of  result  shares  introduced 
by  the  proxies,  which  prevents  the  server  from  com¬ 
bining  result  shares  to  obtain  the  result  in  the  clear. 
Transmission  from  the  coordination  server  to  the  cli¬ 
ents  is  also  protected  by  this  encryption. 

Homomorphic  Encryption 

Like  all  secure  encryption  schemes,  secure  HE  schemes 
make  it  intractable,  under  certain  computational 
hardness  assumptions,  to  recover  information  about 
plaintext  from  its  encrypted  ciphertext.2,3  We  use  a 
representative  approach  to  HE  that  employs  a  multi¬ 
dimensional  lattice  over  a  finite  field.  We  use  a  vector 
basis  to  represent  the  lattice.  Each  plaintext  input  to 
the  computation  is  encrypted  to  a  ciphertext  encoded 
as  a  vector — represented  as  a  large  matrix — not  in  the 
lattice.  Security  is  based  on  the  closest-vector  problem:  a 
known  hard  problem  of  finding  the  lattice  vector  with 
the  least  distance  to  a  given  vector — in  our  case,  the 
ciphertext  vector. 

Computation  on  encrypted  data  proceeds  by  manip¬ 
ulating  ciphertext  matrix  representations.  However, 
encryption  embeds  noise  into  these  representations. 
As  computation  proceeds,  this  noise  grows.  If  too  much 
noise  accrues,  decryption  might  identify  the  wrong 
lattice  vector  and  thus  return  the  wrong  plaintext.  We 
can  decrease  ciphertext  noise  by  increasing  the  dimen¬ 
sionality  of  the  ciphertext’s  matrix  while  maintaining 
security.  Increasing  the  dimensionality  of  the  matrix 
allows  for  more  computation  to  be  performed  before 
too  much  noise  accumulates  but  also  results  in  compu¬ 
tationally  difficult  manipulations  of  large  matrices.  Even 
with  such  noise  reduction,  noise  still  accumulates,  ulti¬ 
mately  limiting  the  depth  of  the  computation  available. 
FHE  systems  such  as  ours  avoid  this  limitation  by  boot¬ 
strapping — periodically  performing  a  cryptographic 
operation  that  resets  the  noise  level  without  compro¬ 
mising  security.  Craig  Gentry  described  an  early  form 
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Figure  2.  Dataflow  in  our  fully  homomorphic  encryption  (FHE)  system.  The  key  infrastructure  (upper  left)  runs  on  a 
trusted  host  and  uses  the  NTRU  public-key  approach  to  generate  key  pairs  public  key  ( Pk )  and  secret  key  (Sk).  Pk  is  shared 
with  a  data  source  (on  the  right)  that  encodes  plaintext  messages  as  mod  p  integers  and  then  encrypts  the  data  using 
that  key.  A  program  source  (lower  right)  provides  a  program,  implemented  as  a  Boolean  circuit,  to  be  evaluated  over  the 
encrypted  data.  The  ciphertext,  a  public-key  encryption  of  Sk,  and  the  program  are  sent  to  a  computation  host  (cloud, 
lower  right).  The  result  ciphertext  is  sent  to  the  client  (lower  left)  that  decrypts  it  using  Sk  to  obtain  the  plaintext  result. 


of  bootstrapping  and  the  resulting  capability  to  perform 
arbitrary- depth  secure  computation.2 

Figure  2  shows  our  FHE  systems  high-level  data¬ 
flow.  The  key  infrastructure  on  the  upper  left  runs  on  a 
trusted  host  and  uses  the  NTRU4  public-key  approach 
to  generate  key  pairs  consisting  of  a  public  key  Pk  and 
secret  key  Sk.  The  Pk  is  shared  with  a  data  source  (on  the 
right)  that  encodes  plaintext  messages  as  modp  integers 
and  then  encrypts  the  data  using  that  key  to  generate 
the  initial  ciphertext.  A  program  source  (on  the  lower 
right)  provides  a  program,  implemented  as  a  Boolean 
circuit,  to  be  evaluated  over  the  encrypted  data.  The 
initial  ciphertext,  a  public-key  encryption  of  the  corre¬ 
sponding  Sk,  and  the  program  are  sent  to  a  computation 
host  (shown  as  a  cloud,  on  the  lower  right).  The  result¬ 
ing  final  ciphertext  is  sent  to  the  client  (on  the  lower 
left)  that  decrypts  it  using  Sk  to  obtain  the  plaintext 
result.  The  protocols  we  use  are  secure  against  “honest 
but  curious”  adversaries  such  as  an  untrusted  host  per¬ 
forming  the  computation  honestly  while  seeking  to  dis¬ 
cover  secret  inputs. 

Our  FHE  programs  comprise  two  computational 
primitives:  EvalAdd  (addition)  and  EvalMult  (multi¬ 
plication).  We  use  these  primitives  to  construct  opera¬ 
tions  for  encryption,  decryption,  and  bootstrapping. 
We  implement  modulus  reduction,  ring  reduction, 
and  key-switching  operations  to  enable  larger  depth  of 


computation  before  bootstrapping,  without  decreasing 
security,  (hi  this  article,  the  term  ring  refers  to  a  math¬ 
ematical  ring  over  the  integers.)  We  also  implement 
specialized  primitives,  such  as  ring  addition,  ring  mul¬ 
tiplication,  and  Chinese  Remainder  Theorem  (CRT), 
because  manipulating  ciphertexts  in  CRT  representation 
is  more  efficient  than  in  power  basis  representations. 

Some  early  homomorphic  systems  relied  on  encod¬ 
ing  a  single  bit  of  plaintext  in  each  ciphertext.  EvalAdd 
and  EvalMult  operations  were  thus  simplified  into  Bool¬ 
ean  XOR  and  AND  operations  but  offered  no  compu¬ 
tation  parallelism.  Ciphertext-to-plaintext  expansion  in 
such  systems  is  quite  large:  in  one  of  our  early  examples, 
the  ciphertext  expansion  ratio  was  223.  In  contrast,  our 
system  encrypts  modp  integers  (p  >  2)  instead  of  single 
bits,  and  we  leverage  single-instruction,  multiple  data 
(SIMD)  approaches  to  pack  multiple  mod  p  integers 
into  each  ciphertext,  thus  computing  parallel  operations 
on  these  packed  integers.  Although  this  approach  offers 
more  efficiency,  leveraging  its  inherent  parallelism  can 
make  algorithm  design  challenging. 

We  use  a  variation  of  the  double- CRT  approach  along 
with  a  residue  number  system  (based  on  the  CRT  over 
the  integers)  to  circumvent  the  problem  of  large  cipher- 
text  moduli  and  correspondingly  large  ciphertext  size.  For 
ring  dimension  n,  each  ciphertext  is  represented  by  an  n  x 
t  matrix  of  t  length-M  integer  vectors  of  mod  qi  values  for 
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pairwise  coprime  moduli  q..  This  contrasts  with  some  pre¬ 
vious  FHE  systems  that  represent  ciphertexts  as  a  single 
integer  vector  mod  Q,  where  Q=  qj  *****  qf.  In  our  sys¬ 
tem,  the  number  of  moduli,  t,  grows  to  support  the  secure 
execution  of  larger  programs,  but  the  number  of  moduli 
qlf  ...  ,  qt  does  not.  With  this  representation,  we  securely 
represent  ciphertexts  as  matrices  of  64-bit  integers  yet  still 
execute  efficiently  on  commodity  computing  hardware 
that  would  make  computation  over  the  multihundred-bit 
or  multithousand-bit  single-vector  integer  representations 
used  in  previous  systems  infeasible. 

The  security  level  of  lattice-based  homomorphic 
encryption  systems  isn’t  often  expressed  in  terms  of 
the  work  factor  used  to  describe  security  in  typical 
cryptosystems.  Instead,  security  is  typically  expressed 
as  the  root  Hermite  factor  8,  a  representation  of  the 
hardness  of  the  closest-vector  problem.  A  lattice- 
based  encryption  system  becomes  more  secure  as  8 
approaches  1.  We  selected  the  value  8  =  1.007  for  our 
work,  which  corresponds  roughly  to  the  work  factor 
required  to  crack  AES  128-bit  encryption. 

The  maximum  depth  of  computation  d  that  can  be 
supported  between  bootstraps  and  the  ring  dimension 
n,  which  correlates  directly  to  the  length  of  ciphertext 
vectors,  significantly  impacts  both  8  and  performance. 
We  have  found  that  with  n  =  1,6384  and  d  =  16,  we 
achieve  8  =  1.007  while  supporting  significant  compu¬ 
tation,  such  as  searching  several  pages  of  encrypted  text 
for  an  encrypted  keyword,  between  bootstraps.  With 
n  =  16,384  and  efficient  packing  of  ciphertexts,  each 
ciphertext  expands  to  between  103  and  106  times  larger 
than  the  corresponding  plaintext. 

Our  system  runs  in  a  compiled  C  environment 
auto -generated  from  Matlab  implementations  (www. 
mathworks.com/products/matlab).  We  use  parallel¬ 
ism  to  take  advantage  of  multicore  processors  in  a  Linux 
environment.  At  8  =  1.007,  we  encrypt  ciphertexts  in 
less  than  100  milliseconds  in  such  environments  and 
decrypt  in  approximately  1  millisecond.  EvalAdd  on 
ciphertexts  takes  several  milliseconds,  whereas  Eval- 
Mult  takes  approximately  500  milliseconds  and  boot¬ 
strapping  takes  approximately  five  minutes. 

Real-World  Potential  for  FHE 
and  LSS  Implementations 

Here,  we  present  our  prototype  applications  and  their 
limitations. 

VoIP  Teleconferencing 

Typical  VoIP  implementations  don’t  provide  end-to-end 
encryption.  Instead,  they  rely  on  a  trusted  server  to  receive 
content  from  clients,  decrypt  that  content,  reencrypt  it, 
and  then  distribute  it  to  other  clients.  This  trusted  server 
is  a  weak  point  in  securing  VoIP  communication. 


Our  teams  independently  developed  LSS  and  FHE 
VoIP  audio  conferencing  approaches  that  provide  end- 
to-end  security  with  performance  suitable  for  three  or 
more  simultaneous  users  and  high-quality  audio.  No 
prior  work  has  demonstrated  the  application  of  these 
technologies  to  streaming  applications  such  as  VoIP. 
Both  our  prototypes  use  Apple  iPhone  5s  handsets, 
Amazon  cloud-based  virtual  servers,  and  suitably  modi¬ 
fied  open  source  VoIP  client  and  server  code. 

LSS-based  VoIP.  Figure  3  shows  our  LSS  VoIP  architec¬ 
ture.  Each  iPhone  runs  a  version  of  the  Mumble  VoIP 
client  application  (http://mumble.sourceforge.net) 
with  the  following  modifications:  Mumble  audio  pro¬ 
cessing  samples  the  microphone  at  16  Kbps  and  loga¬ 
rithmically  compresses  this  to  a  standard  8-bit  p-LAW 
floating-point  representation.6  We  added  encryption 
for  turning  each  sample  into  a  metashare  by  computing 
XOR  of  each  sample  with  elements  drawn  from  three 
AES  128-bit  counter-mode  cipher  streams  seeded  from 
pre-placed  passphrases.  The  network  interface  packs 
1,440  sample  metashares  (90  milliseconds  of  audio 
data)  into  each  transmitted  network  packet. 

As  Figure  3  shows,  each  client  creates  and  then  sends 
each  metashare  packet  via  Wi-Fi  (802. 1 1  ac)  to  an  Apple 
Airport  Extreme  wireless  access  point,  which  forwards 
it  to  a  virtualized  coordination  server  in  the  Amazon 
Elastic  Cloud  Service  (ECS).  This  virtual  machine  runs 
a  modified  version  of  uMurmur  (https: //code.google. 
com/p /umurmur)  to  handle  user  session  manage¬ 
ment  and  audio  stream  routing.  Our  uMurmur  variant 
distributes  each  client  audio  packet  to  each  of  three 
proxies,  gathers  result  share  packets  from  those  prox¬ 
ies  after  computation,  computes  XOR  on  the  result 
shares  together  sample-wise,  and  sends  the  resulting 
metashare  to  clients  for  decryption. 

Our  proxies,  which  are  also  virtual  machines  hosted 
in  the  Amazon  ECS,  run  our  ShareMonad  audio  pro¬ 
cessing  application.  Each  proxy  recreates  one  of  the 
three  entropy  streams  and  uses  this  to  compute  its  share 
of  each  sample  from  the  received  metashares.  Collec¬ 
tively,  the  proxies  obliviously  decode  each  logarith¬ 
mically  compressed  audio  stream  to  a  linear,  integer 
representation;  mix  all  decoded  audio  streams  together; 
clip  the  resulting  audio  signal;  and  recompress  the  result 
for  distribution. 

This  computation  repeats  for  each  participating  cli¬ 
ent,  omitting  that  client’s  audio  stream  so  users  don’t 
hear  their  own  voices.  Each  audio  stream  result  share  is 
sent  back  to  the  coordination  server,  where  it  undergoes 
XOR  with  shares  from  other  proxies  and  is  then  sent  to 
client  handsets  for  decryption  and  playback. 

A  hand-optimized  approach  required  12  seconds 
of  processing  per  1,440-sample  block  for  four  users, 
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4  Proxies  decode,  mix,  clip,  and  reencode  encrypted  voice  data 


Amazon  Elastic  Cloud 


Figure  3.  LSS-based  voice-over-IP  (VoIP)  system  architecture.  iPhone  5s  VoIP  clients  sample  audio  input  at  16  Kbps,  encode 
samples  to  a  standard  8-bit  p-LAW  floating-point  representation,  and  encrypt  the  resulting  encoded  samples  using  three 
Advanced  Encryption  Standard  128-bit  counter-mode  cipher  streams.  Clients  send  packets  of  1,440  encrypted  samples  (90 
ms  of  audio)  over  Wi-Fi  802.1  lac  and  through  the  Internet  to  proxy  servers  in  the  Amazon  Elastic  Cloud  that  decode,  add, 
and  clip  the  sample  streams  without  decrypting  them.  The  resulting  combined  audio  stream  is  reencrypted  and  sent  back 
to  the  clients  for  decryption  and  playback. 


exceeding  the  90-millisecond  limit  required  to  maintain 
processing  at  streaming  rates.  Applying  an  LSS  index 
lookup  over  a  public  table7  of  precomputed  results  for 
the  decode-mix-clip-encode  function  let  us  reduce  this 
delay  to  25  milliseconds,  allowing  sufficient  time  to 
meet  the  90-millisecond  goal  and  compensate  for  net¬ 
work  delays  between  handsets  and  servers.  With  this 
optimization,  we  achieved  streaming  throughput  for  up 
to  four  voices  at  16  Kbps  audio  rates,  enabling  users  to 
communicate  clearly. 

We  used  16-core  (C3  size)  Amazon  cloud  servers  as 
proxies,  resulting  in  roughly  80  percent  CPU  utilization. 
In  contrast,  plaintext  processing  at  this  performance 
level  requires  only  a  small  portion  of  a  single  CPU  core. 
Memory  use  was  small  and  not  a  constraining  factor. 
Network  bandwidth  available  in  our  Amazon  cloud 
instances  was  sufficient  with  no  special  optimization. 

In  the  absence  of  collusion  among  the  proxies,  our 
solution  provides  two  layers  of  AES  128-bit  security  at 
each  proxy.  Each  proxy  receives  metashares  encrypted 
with  three  AES  128-bit  counter-mode  cipher  streams 
yet  has  access  to  only  one  of  these  cipher  streams.  Thus, 
adversaries  observing  from  any  one  proxy  can  learn  noth¬ 
ing  of  the  plaintext  audio  samples  used  as  input.  Adver¬ 
saries  observing  from  the  coordination  server  can  learn 


nothing  about  the  input  from  the  metashares  it  conveys, 
because  that  server  holds  none  of  the  cipher  streams  used 
for  encryption  and  decryption.  Because  each  proxy  adds 
new  layers  of  encryption  (using  cipher  streams  to  which 
the  coordination  server  has  no  access)  to  the  result  shares 
it  sends  back  to  the  coordination  server,  that  server  simi¬ 
larly  can  learn  nothing  of  the  computation  result. 

FHE-based  VoIP.  We  developed  an  FHE-based  approach 
to  secure  VoIP  teleconferencing  that  requires  only  a  sin¬ 
gle  proxy.  This  advance  is  built  on  a  vocoder  technology 
that  takes  voice  samples  from  each  client  as  input  and 
encodes  those  samples  as  vectors  of  integers  that  are 
then  encrypted.  This  vocoder  is  linear  and  can  be  used 
with  an  additive  HE  scheme  to  provide  an  encrypted 
VoIP  teleconferencing  capability.  Encoded  voice  sam¬ 
ples  are  encrypted  at  each  iPhone  client  with  the  client  s 
public  key,  using  the  additive  HE  scheme. 

For  our  prototype,  all  clients  use  the  same  key, 
because  our  focus  is  on  demonstrating  the  practical 
feasibility  of  an  FHE  computation  rather  than  on  well- 
understood  security  concerns.  The  resulting  cipher- 
texts  are  sent  to  a  VoIP  mixer  that  queues  and  adds  the 
ciphertext  from  the  clients  without  decrypting  the  data 
or  sharing  keys.  The  resulting  added  ciphertext  is  sent 
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Figure  4.  Email  border  guard  system  architecture.  In  the  LSS  version  of  the  border 
guard,  the  mail  server  connects  to  three  proxies  (not  shown  in  the  figure).  An 
email  client  plug-in  computes  a  sent  message's  metashare  using  key  material 
shared  a  priori  with  the  proxies  and  sends  the  metashare  to  the  mail  server, 
which  distributes  it  to  the  proxies.  The  proxies  collectively  search  the  encrypted 
message,  producing  shares  of  a  Boolean  indication  of  a  regular  expression 
match.  In  the  FHE  version,  the  client  homomorphically  encrypts  the  message 
and  sends  it  to  a  single  proxy  that  searches  the  encrypted  message  for  matches 
with  a  predefined  set  of  strings,  also  producing  a  Boolean  indication  of  a  match. 
The  mail  server  (in  the  LSS  case)  or  the  client  (in  the  FHE  case)  receives  the 
computation  result  and  uses  it  to  determine  whether  to  forward  the  message. 


back  to  the  clients.  When  decrypted  with  the  clients’ 
private  key  using  the  additive  homomorphic  decryp¬ 
tion  scheme,  decoded  using  our  decoding  scheme,  and 
played  back  to  the  clients,  the  resulting  audio  is  a  mix  of 
all  the  clients’  audio  streams. 

Our  FHE -based  VoIP  uses  a  prototype  architecture 
similar  to  the  LSS-based  VoIP  teleconferencing  capa¬ 
bility,  but  with  lower  end-to-end  latency.  When  we 
ran  this  system  with  a  server  in  Virginia  and  clients  in 
Massachusetts,  the  total  latency  was  on  the  order  of  80 
milliseconds,  with  the  latency  roughly  split  among  com¬ 
munication,  encryption,  and  decryption.  The  mixing 
latency  was  nearly  trivial,  taking  less  than  1  millisecond. 

With  our  FHE-based  approach,  no  keys  are  stored 
on  the  teleconference  server,  so  privacy  is  preserved 
even  if  adversaries  view  all  communication  links  and 
server  operations.  Trust  in  the  communication  links 
or  teleconference  server  isn’t  required  to  provide  pri¬ 
vacy.  The  security  level  provided  in  the  current  demo 
is  roughly  at  the  level  of  AES  128-bit  encryption,  but 
parallels  between  the  security  levels  of  our  encryption 
scheme  and  other  current  standards  aren’t  exact.  We 
can  increase  our  teleconference  capability’s  security 
level  arbitrarily  at  the  expense  of  bandwidth  require¬ 
ments  or  voice  quality  by  modifying  the  sampling  rate 
and  dynamic  range  of  the  sampled  voice  data. 


Email  Border  Guards 

Providing  privacy  using  email  encryption  and  achiev¬ 
ing  information  security  using  trusted-party  email  fil¬ 
tering  at  network  boundaries  are  mutually  exclusive 
goals.  Either  email  must  be  decrypted  to  verify  com¬ 
pliance  to  InfoSec  policies  (compromising  privacy),  or 
those  policies  must  be  enforced  by  each  user  prior  to 
message  encryption  (compromising  trust  in  filtering). 
We  explored  solutions  to  this  problem  by  studying 
applications  in  which  transaction  throughput  is  impor¬ 
tant.  In  our  solutions,  users  encrypt  email  messages 
on  their  trusted  computer.  The  messages  are  sent  to  an 
untrusted  mail  server  for  forwarding  to  a  destination. 
This  mail  server  also  acts  as  a  border  guard,  checking 
each  email  message  for  certain  content  and  passing  it 
on  to  its  destination  only  if  that  content  is  absent.  The 
border  guard  performs  this  content  checking  without 
decrypting  the  messages. 

LSS-based  regular  expression  search  email  guard.  We  use 

the  Claws  email  client  and  a  typical  email  server,  along 
with  plug-ins  to  each  via  standard  APIs,  to  search  each 
outbound  encrypted  email  for  occurrences  of  text  that 
match  a  set  of  prespecified  regular  expressions,  for¬ 
warding  messages  that  do  not  include  such  matches  and 
rejecting  those  that  do. 

Figure  4  shows  our  system  architecture.  In  the  LSS 
version,  the  mail  server  connects  to  three  proxies  that 
perform  the  LSS  computation  (not  shown  in  the  figure). 
When  a  user  sends  a  message,  a  plug-in  to  the  Claws  cli¬ 
ent  computes  the  message’s  metashare  using  key  mate¬ 
rial  shared  a  priori  with  the  proxies.  The  email  client 
sends  the  metashare  to  the  mail  server,  where  a  Milter 
(www.milter.org)  plug-in  distributes  it  to  the  proxies, 
each  of  which  derives  its  share.  The  regular  expression 
set  is  compiled  into  a  Boolean  circuit  and  distributed 
to  the  proxies  in  advance.  The  proxies  collectively  com¬ 
pute  the  regular  expression  search  on  the  message,  using 
an  adaptation  of  a  mechanism  that  transforms  regular 
expressions  into  finite  automata.8  Each  server  produces 
one  share  of  the  Boolean  indication  of  whether  any 
regular  expressions  match  against  any  portion  of  the 
encrypted  message  corpus.  Our  Milter  plug-in  com¬ 
bines  these  shares  to  obtain  a  plaintext  Boolean  answer, 
which  it  passes  to  the  mail  server.  The  mail  server  then 
accepts  and  forwards  the  message,  or  it  drops  the  mes¬ 
sage  and  informs  the  sender’s  client,  as  appropriate. 

We  performed  several  experiments  on  this  system, 
optimizing  the  resulting  Boolean  circuit  to  consider  dif¬ 
ferent  numbers  of  regular  expression  characters.  Pro¬ 
cessing  16  message  characters  at  a  time  was  the  point  of 
diminishing  returns.  For  a  typical  1 -Kbyte  email  ASCII 
message  and  a  set  of  regular  expressions  that  roughly 
represents  classification  markings  that  might  be  used  in 
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a  government  setting,  checking  an  email  message  took 
approximately  90  seconds  using  quad-core,  3- GHz  Intel 
architecture  blade  servers  as  proxies.  CPU  utilization 
averaged  approximately  90  percent  during  processing, 
and  memory  utilization  was  minimal. 

FHE-based  encrypted  keyword  search  email  guard.  We 

developed  a  prototype  FHE  application  that  searches 
for  encrypted  keywords  in  encrypted  text.  This  method 
relies  on  a  homomorphic  string  comparison  operation 
that’s  repeated  for  all  keywords  in  all  locations  of  an 
encrypted  message.  As  in  the  LSS  method,  we  imported 
this  technology  into  an  email  guard-type  scenario  to 
provide  outsourced  email  filtering  based  on  email  cli¬ 
ents’  keywords  of  interest.  Because  the  result  of  the 
string  comparison  is  only  available  to  the  mail  server 
in  encrypted  form,  our  protocol  sends  the  encrypted 
result  back  to  the  client,  where  it’s  decrypted  to  reveal 
whether  the  message  should  be  sent.  Thus,  our  proto¬ 
type  assumes  an  honest  sender  and  requires  an  extra 
round-trip  between  client  and  server.  Figure  4  shows  a 
sketch  of  this  technology. 

We’re  currently  running  this  implementation  at  a 
low  security  level  (8  =  1.08)  to  enable  the  email  system 
to  be  interactive  with  fast  response  times.  Our  initial 
implementation  uses  a  ring  dimension  of  n  =  512  and 
encrypts  emails  with  a  supported  depth  of  computation 
d  =  12.  This  results  in  an  effective  ciphertext  modulus 
q  represented  with  430  bits.  With  these  parameter  con¬ 
figurations,  we  can  sort  over  encrypted  paragraph-long 
emails  with  five-  to  six-character  words  in  less  than  a 
minute.  Result  decryption  runs  in  a  matter  of  seconds. 

We  could  tune  this  FHE-based  email  guard  to  an 
extremely  secure  setting  (8  =  1.0055  or  less)  using  our 
current  implementation  with  a  similar  depth  of  com¬ 
putation.  We  would  choose  a  ring  dimension  of  16,384 
and  an  effective  ciphertext  modulus  (^represented  with 
521  bits.  Encryption  runtime  at  these  settings  is  on  the 
order  of  minutes,  encrypted  message  filtering  would 
take  hours  on  a  nonparallelized  server,  and  decryption 
would  take  a  matter  of  seconds. 

In  a  world  in  which  Bob  and  Alice  need  to  work 
together  but  are  no  longer  comfortable  sharing  their 
secrets,  or  where  Alice  needs  Charlie’s  help  to  process 
data  but  feels  uncomfortable  with  Charlie  (or  the  ever- 
lurking  Eve)  seeing  the  data,  secure  computation  holds 
promise.  However,  secure  computation  methods  differ; 
each  has  its  distinct  tradeoffs,  security  models,  and  cave¬ 
ats.  Our  experiments  show  that  some  practical  applica¬ 
tions  are  emerging,  but  substantive  work  remains  to  be 
done  to  make  secure  computation  practical  for  broad 
classes  of  applications.  ■ 
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Abstract — In  this  paper  we  present  a  practical  new  approach 
to  scalable  and  secure  VoIP  teleconferencing  where  data  remains 
encrypted  even  at  the  VoIP  teleconferencing  server.  Until  now, 
VoIP  teleconferencing  services  have  required  either  1)  all  tele¬ 
conferencing  clients  to  maintain  point-to-point  links  with  other 
clients  or  2)  a  teleconferencing  server  which  can  access  and 
manipulate  all  VoIP  streams  unencrypted,  in  the  clear.  We  present 
a  new  approach  which  uses  a  teleconferencing  server  which 
manipulates  only  encryptions  of  the  VoIP  streams,  thus  avoiding 
the  respective  scalability  and  security  issues  of  previous  classes 
of  approaches.  Our  new  approach  relies  on  recent  advances 
in  practical  homomorphic  encryption  to  provide  end-to-end 
encryption.  Voice  data  is  sampled,  encoded  and  encrypted  at  the 
VoIP  teleconference  clients,  sent  over  a  generic  network  such  as 
the  open  Internet  to  the  encryption-enabled  server,  and  mixed  at 
the  server  without  decrypting  the  VoIP  data.  The  encrypted  result 
of  the  mixing  is  sent  back  to  the  clients  for  decryption,  decoding 
and  playback  to  the  users.  The  homomorphic  encryption  basis  of 
our  secure  VoIP  teleconferencing  capability  is  a  modification  of 
NTRU,  and  can  provides  fully  homomorphic  and  post-quantum 
features,  but  we  only  use  additive  homomorphic  capabilities  in 
this  work.  We  discuss  our  working  prototype  of  this  secure, 
practical  VoIP  teleconferencing  capability  running  on  iOS  clients 
and  the  lowest-cost  Amazon  AWS  server.  Our  prototype  provides 
full-duplex  lOOkbs  throughput  and  average  90ms  latency,  which 
is  higher  quality  than  many  commercial  VoIP  services  such 
as  Skype  and  GoToMeeting,  besides  being  much  more  secure. 
We  present  the  design  of  our  VoIP  system,  with  a  particular 
focus  on  the  VoIP  encoding/decoding  scheme  and  homomorphic 
mixing  operations.  These  encoding/decoding  operations  and  the 
encrypted  homomorphic  mixing,  coupled  with  an  efficient,  usable 
implementation  compatible  with  commodity  hardware  are  the 
primary  advance  we  have  been  able  to  leverage  to  enable  secure 
VoIP  teleconferencing  with  end-to-end  encryption.  We  present 
experimental  results  to  show  the  scalability,  performance  and 
voice  quality  trade-offs  of  our  design  and  implementations  when 
used  over  local-area,  national  and  intercontinental  distances. 

I.  Introduction 

There  has  been  an  unmet  technological  need  to  provide 
a  scalable  capability  for  multiple  geographically  distributed 
people  need  to  simultaneously  converse  as  a  group  at  the 
same  time  over  data  networks.  This  need,  until  know,  has 
been  partially  served  by  either  have  physically  secure  dedicated 
point-to-point  communication  links  as  provided  by  dedicated 
circuits,  or  through  physically  unsecured  point-to-point  com- 
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munication  links  which  are  made  secure  with  point-to-point 
encryption  technologies  [1],  [2]. 

These  prior  approaches  to  secure  communication  are  either 
not  scalable  or  have  not  adequately  addressed  several  im¬ 
portant  vulnerabilities.  Physically  secure  communication  links 
are  not  feasible  over  broad  geographic  areas  such  as  for 
trans-continental  and  inter-continental  communication.  Point- 
to-point  encryption  solutions  do  no  scale  because  when  there 
are  more  than  a  handful  of  participants  in  a  teleconference  call, 
a  large  number  of  point-to-point  communication  links  are  prac¬ 
tically  difficul  to  setup  and  maintain,  often  leading  to  latency 
issues  which  would  degrade  the  quality  of  user  experiences. 
For  these  reasons,  there  is  a  need  for  such  a  technology  to 
provide  scalable,  secure  and  practical  teleconferencing  services 
which  can  be  used  to  host  multi-party  negotiations,  planning, 
education  and  information  distribution  of  a  sensitive  nature. 

The  needs  for  scalable,  secure  and  practical  teleconferenc¬ 
ing  services  has  been  partially  met  with  Voice  over  IP  (VoIP) 
teleconferencing  technologies  where  users  can  converse  with 
one  another  by  encoding  data  for  transmissions  between  users 
over  IP  data  networks.  VoIP  provides  a  fundamentally  scalable 
and  practical  approach  to  teleconferencing,  especially  with 
the  advent  of  global  packet-switched  information  networks. 
Unfortunately,  existing  VoIP  teleconferencing  capabilities  such 
as  GoToMeeting,  Skype  and  Mumble  among  others  have  not 
been  both  scalable  and  secure  against  data  leaking  to  adver¬ 
saries  who  wish  to  snoop  on  private  or  even  proprietary  group 
communication.  These  technologies  have  been  vulnerable  to 
man-in-the-middle  attacks  of  various  types  [3]. 

Although  modern  VoIP  teleconferencing  technologies  have 
been  good  at  protecting  data  in  transit  between  clients  and  a 
server,  all  bets  are  off  when  the  data  reaches  a  server  where  it 
needs  to  be  mixed.  The  majority  of  widely  used  existing  VoIP 
teleconferencing  capabilities  require  a  central  VoIP  server  to 
mix  all  of  the  VoIP  signals  from  clients  which  are  then  sent 
back  to  the  clients.  Until  now  this  has  required  the  VoIP  server 
to  have  access  to  unencrypted  VoIP  data.  That  is,  the  VoIP 
mixing  operation,  which  merges  the  VoIP  streams  from  the 
clients,  has  until  now  needed  to  be  performed  in  the  clear,  on 
unencrypted  VoIP  data.  This  creating  a  possible  opportunity 
for  adversaries  to  snoop  on  otherwise  protected  VoIP  data  if 
the  adversaries  gain  access  to  the  VoIP  server. 

The  mixing  of  VoIP  data  in  the  clear  is  adequate  when 
the  VoIP  server  is  fully  trusted  by  all  participants.  However, 
VoIP  teleconferencing  servers  are  often  hosted  in  a  semi-secure 
environment,  such  as  by  commodity  cloud  providers  such  as 
Amazon  AWS  or  Microsoft  Azure.  Some  users,  such  as  in 
less  technologically  developed  regions  of  the  globe,  might 
not  have  access  to  low-cost  cloud  environments,  requiring  the 
deployment  of  VoIP  servers  on  local  hardware  which  are  less 
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secure  against  compromise  or  seizure. 

The  limitation  of  required  server  trust  has  until  now  pre¬ 
vented  the  use  of  VoIP  teleconferencing  technologies  from 
being  used  in  less-than-fully  trusted  situations,  such  as  in  the 
cloud  where  the  underlying  server  hardware  and  the  cloud 
providers  cannot  be  fully  trusted  to  not  leak  information.  This 
induces  an  unfortunate  trade-off  of  for  this  architecture  of 
either  requiring  all  participants  to  maintain  group  conversations 
in  the  clear  in  untrusted  environments  or  paying  a  higher 
cost  of  maintaining  access  to  a  trusted  VoIP  server  if  secure 
teleconferencing  is  needed. 

Taken  together,  these  technological  def  ciencies  point  to  a 
need  for  a  VoIP  teleconferencing  capability  where  VoIP  data 
can  never  decrypted  except  on  the  clients  which  have  access 
to  decryption  keys.  Thus,  unlike  previous  VoIP  attack  analyses 
which  focus  on  signaling  attacks  [4],  [5]  during  VoIP  call  set¬ 
up,  we  are  particularly  interested  in  protecting  against  man-in- 
the-middle  attacks  which  involve  compromise  the  VoIP  server. 

In  this  paper  we  present  a  secure,  scalable  and  practical 
method  to  protect  against  the  leakage  of  sensitive  VoIP  tele¬ 
conferences  even  on  VoIP  teleconference  servers  that  have 
been  fully  compromised.  We  provide  a  method  for  parties 
to  have  privacy-preserving  teleconferences  where  communica¬ 
tion  privacy  is  maintained  despite  all  communications  of  the 
clients  being  observed  during  the  teleconference,  even  at  the 
teleconference  mixer.  The  basis  of  our  approach  is  a  method 
for  additive  homomorphic  encryption  such  that  all  clients  have 
a  common  private  key.  The  clients  encode  their  voice  samples 
with  a  additive  encoding  scheme,  encrypt  their  encoded  voice 
data  with  an  additive  homomorphic  encryption  scheme,  send 
their  encrypted  voice  samples  to  a  mixer  which  performs  an 
encrypted  homomorphic  addition  on  the  encrypted  voice  and 
sends  the  results  back  to  the  clients.  The  clients  then  decrypt, 
decode  and  play  back  the  result.  Our  scheme  relies  on  the  pre¬ 
sharing  of  a  common  private  key  for  an  additive  homomorphic 
encryption  scheme,  but  it  is  possible  in  principle  to  practically 
generalize  beyond  this  pre-shared  key  design. 

We  implemented  this  scheme  to  run  on  commodity  iPhone 
clients  and  the  current  lowest-cost  Amazon  EC2  server.  Our 
capability  provides  end-to-end  encryption  of  all  VoIP  data  from 
the  VoIP  clients  hosted  on  the  phones  with  no  decryption  at 
the  server.  This  implementation  is  secure,  relying  on  a  post¬ 
quantum  encryption  scheme  which  protects  even  against  quan¬ 
tum  computing  attacks  on  the  encrypted  data.  This  capability 
also  provides  relatively  high  sound  quality  with  full-duplex 
lOOkbs  data  rates.  This  initial  implementation  is  intended 
to  be  a  proof-of-concept  capability,  with  the  possibility  of 
improving  upon  this  technology  with  existing  key  management 
technologies  [6]  and  session  initiation  technologies  [7]  with 
additional  engineering  investment  and  little  or  no  research  risk. 

Our  new  approach  relies  on  recent  advances  in  practical 
homomorphic  encryption  to  provide  end-to-end  encryption. 
With  this  approach,  encrypted  VoIP  data  is  mixed  on  a  VoIP 
teleconference  server  without  decrypting  data  at  the  server  or 
sharing  decryption  keys  with  the  server.  We  present  our  work¬ 
ing  prototype  of  this  secure,  practical  VoIP  teleconferencing 
capability  running  on  iOS  clients  and  the  lowest-cost  Amazon 
AWS  server  and  provide  experimental  analyses  of  this  imple¬ 
mentation.  Our  innovation  is  in  the  VoIP  encoding/decoding 


scheme  and  homomorphic  mixing  operations,  coupled  with  an 
effcient,  usable  implementation  compatible  with  commodity 
hardware. 

The  paper  is  organized  as  follows.  Section  II  discusses  the 
design  goals  of  our  encrypted  VoIP  teleconferencing  system. 
Section  III  discusses  the  overall  design  of  our  end-to-end  en¬ 
crypted  VoIP  teleconferencing  capability.  Section  V  discusses 
the  engineering  trade-offs  associated  with  parameter  selection. 
Section  VI  discusses  how  we  implemented  our  design  to 
run  on  iOS  clients  and  Linux  servers.  Section  VII  presents 
experimental  results  of  deploying  our  end-to-end  encrypted 
VoIP  teleconferencing  capability  on  the  open  Internet.  Section 

VIII  discusses  related  work  on  relevant  technologies.  Section 

IX  presents  a  discussion  of  our  capability  and  ongoing  work. 

II.  Design  Goals 

We  identif  ed  several  design  goals  and  metrics  of  perfor¬ 
mance  with  which  to  evaluate  and  reason  over  our  end-to-end 
encrypted  VoIP  teleconferencing  designs  and  implementations. 
Our  primary  high-level  design  goals  and  metrics  are: 

1)  Sound  Quality:  The  end-to-end  encrypted  VoIP  tele¬ 
conferencing  capability  should  provide  sound  quality 
at  least  as  good  as  a  Public  Switched  Telephone 
Network  (PSTN),  preferably  with  full-duplex. 

2)  Latency:  The  end-to-end  encrypted  VoIP  teleconfer¬ 
encing  capability  should  provide  an  end-to-end  la¬ 
tency  ideally  of  less  than  100ms  for  trans-continental 
VoIP  teleconference  session,  a  generally  accepted 
reasonable  latency  for  VoIP  technologies,  but  more 
latency  is  acceptable  for  inter-continental  operations. 

3)  Scalability:  The  end-to-end  encrypted  VoIP  telecon¬ 
ferencing  capability  should  be  able  to  support  four 
people  speaking  simultaneously  while  ten’s  of  partic¬ 
ipants  listen  without  degradation  in  sound  quality  or 
latency. 

4)  Secure:  The  end-to-end  encrypted  VoIP  teleconfer¬ 
encing  capability  should  provide  an  encryption  work 
factor  roughly  at  least  as  good  as  the  work  factor 
for  AES-128.  This  means  that  the  VoIP  data,  when 
encrypted,  should  require  at  least  as  much  computa¬ 
tional  effort  to  obtain  the  unencrypted  data  without 
a  key  as  is  needed  for  AES-128,  a  commonly  used 
point-to-point  secure  encryption  technology. 

5)  Resource  Efficient:  FHE  schemes  have  been  known 
to  require  encrypted  data  which  is  much  larger  than 
the  original  source  data.  Early  schemes  provided  a 
ciphertext  expansion  of  several  orders  of  magnitude 
larger  than  the  source  data.  The  end-to-end  encrypted 
VoIP  teleconferencing  capability  should  ideally  re¬ 
quire  less  than  an  order  of  magnitude  ciphertext 
expansion. 

6)  Wide  Geographic  Area:  The  end-to-end  encrypted 
VoIP  teleconferencing  capability  should  operate  with 
users  and  the  VoIP  mixing  server  over  a  wide  ge¬ 
ographic  area,  ideally  trans-continental  if  not  inter¬ 
continental  without  an  unacceptable  degradation  in 
sound  quality  or  latency. 

7)  Portable:  The  end-to-end  encrypted  VoIP  teleconfer¬ 
encing  capability  should  be  easily  ported  to  other 
client  and  server  types. 
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Fig.  1:  High-Level  VoIP  Teleconferencing  Design 


Fig.  2:  High-Level  Data  Client  Internal  Data  Processing 


8)  Easily  Deployable:  The  end-to-end  encrypted  VoIP 
teleconferencing  capability  should  be  easy  to  deploy, 
such  as  with  small  binaries. 

9)  Usable:  The  end-to-end  encrypted  VoIP  teleconfer¬ 
encing  capability  should  be  intuitive  and  easy  to  use. 

10)  Extensible:  The  end-to-end  encrypted  VoIP  telecon¬ 
ferencing  capability  should  be  easy  to  modify  to  add 
additional  and  more  advanced  functionality  at  a  later 
date. 

III.  Teleconferencing  Architecture 

Figure  1  shows  a  high  level  example  illustrative  application 
of  this  privacy-preserving  VoIP  teleconferencing  technology 
with  end-to-end  encryption.  Each  of  the  clients  samples  users’ 
voice  data,  encodes  it,  encrypts  it  and  sends  the  result  to  the 
VoIP  mixer.  The  mixer  sends  a  result  back  which  is  then 
decrypted,  decoded  and  played  back  to  the  clients’  users.  Any 
encryption  system  could  be  used  that  supports  an  additive 
homomorphism  which  could  be  implemented  in  a  practical 
manner.  A  representational  scheme  that  supports  additive  ho- 
momorphisms  is  NTRU  which  can  be  made  both  Somewhat 
Homomorphic  (SHE)  and  Fully  Homomorphic  (FHE)  in  addi¬ 
tion  to  additive  homomorphic. 

Our  approach  uses  a  shared  secret  key,  but  more  general 
designs  are  possible  that  generalize  beyond  this  initial  shared 
secret  key  design.  Input  voice  streams  from  clients  are  sampled 
and  homomorphically  encrypted  using  a  clients  public  key. 
The  encrypted  voice  samples  are  sent  to  an  FHE-enabled  VoIP 
server  that  does  not  have  access  to  encryption  keys.  The  VoIP 
server  combines  and  balances  the  encrypted  audio  feeds.  The 
combined  output  is  then  forwarded  to  the  client  handsets, 
where  it  is  decrypted  and  played  back  for  the  user.  Our  FHE- 
based  solution  processes  streaming  audio  at  10  kBytes/s  per 
voice. 

The  output  of  the  processing  is  sent  to  the  client,  where  it 
is  decrypted  using  the  clients  private  key.  No  keys  are  stored 
on  the  teleconference  server,  so  privacy  is  preserved  even  if  an 
adversary  views  all  communication  links  and  operations  on  the 
server.  No  trust  of  the  communication  links  or  teleconference 
server  is  required  to  provide  privacy.  The  level  of  security 
provided  in  the  current  prototype  is  roughly  at  the  level  of 


AES-128,  but  parallels  between  the  security  levels  of  the 
encryption  scheme  and  other  current  standards  are  not  exact. 
We  can  increase  the  security  of  our  teleconference  capability 
to  be  arbitrarily  higher  at  the  expense  of  voice  quality  by 
decreasing  sampling  rate  and  dynamic  range. 

Figure  2  shows  how  the  clients  support  data  fows  inter¬ 
nally.  In  the  top  of  the  diagram,  data  from  the  microphone  is 
sampled  and  fed  to  the  encoder,  encrypted  using  an  additive 
homomorphic  encryption  scheme  and  sent  to  the  mixer.  As 
seen  in  the  bottom  of  the  f  gure,  the  result  returned  from  the 
mixer  is  decrypted,  decoded  and  played  back  over  a  speaker. 

Figure  3  shows  how  the  VoIP  mixer  takes  encrypted  input 
from  various  clients  and  returns  a  common  output.  For  a 
representational  VoIP  system  with  clients  (ci,  c2,  C3, . . . ,  cm), 
a  client  c\  would  want  (ci  +  C2  + . . .  +  Cj_  1  +  c*+ 1  + . . .  +  cm). 
This  summation  can  be  performed  in  a  tree  fashion  as  illus¬ 
trated  in  Figure  3.  For  our  representational  NTRU  scheme,  the 
ciphertexts  are  vectorized  in  blocks  of  m,  and  all  additions  are 
performed  modulo  some  large  integer  q  pre-specifed  by  the 
key  generator. 

Our  encoder/decoder  is  additive  so  that  we  can  rely  on 
an  additive  homomorphism  such  as  the  EvalAdd  operation 
to  mix  VoIP  signals.  Because  we  require  only  an  effeient 
secure  EvalAdd  operation  to  support  encrypted  VoIP  mixing, 
our  design  builds  on  the  recent  efif cient  FHE  design  and 
implementation  discussed  in  [8].  We  simplif  ed  this  prior  work 
such  that  we  remove  the  ability  to  support  EvalMult  operations. 
As  such,  because  we  only  need  to  support  much  smaller 
circuits,  we  do  not  need  the  parallelism  capabilities  as  dis¬ 
cussed  in  [8]  for  our  VoIP  application  and  integration  with  the 
existing  Mumble/Murmur  open-source  VoIP  systems.  We  also 
use  much  smaller  parameters  than  the  designs  advocated  in  [8] 
because  we  require  much  more  greatly  reduced  functionality. 
Thus,  the  basis  of  our  encryption  approach  is  a  special  limited 
version  of  FHE  called  Additive  Homomorphic  Encryption 
which  allows  an  untrusted  computation  host  to  compute  the 
encrypted  sum  of  encrypted  integers. 

A.  Client  Vocoder 

We  have  a  developed  a  vocoder  technology  which  takes 
voice  samples  from  a  client  and  encodes  the  voice  samples  as 
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Fig.  3:  Encrypted  VoIP  Server  Mixing  for  Three  Clients 


vectors  of  integers.  This  vocoder  is  linear  so  that  it  can  be  used, 
for  example,  with  an  additive  homomorphic  encryption  scheme 
to  provide  an  encrypted  VoIP  teleconferencing  capability.  In 
this  example,  the  encoded  voice  samples  are  encrypted  using 
the  additive  homomorphic  encryption  scheme.  This  operations 
are  performed  on  multiple  clients.  The  resulting  ciphertexts 
are  sent  to  a  VoIP  mixer  which  queues  and  adds  the  cipher- 
texts  from  the  clients.  The  resulting  added  ciphertext  can  be 
sent  back  to  the  clients.  When  decrypted  with  the  additive 
homomorphic  decryption  scheme,  decoded  using  our  decoding 
scheme  and  played  back  to  the  clients,  the  resulting  audio  is 
a  mixing  of  the  audio  from  the  clients. 

Our  encoding  goal  is  to  convert  a  length-m  data  frame 
of  y- bit  VoIP  samples  into  a  length-?!  frame  of  integers 
with  the  property  that  Encode(inputi)  +  Encode(input2)  = 
Encode(inputi  +  input-2).  As  seen  in  the  left  hand  side  of 
Figure  4,  we  split  the  length  m  sample  input  into  multiple 
blocks  of  n  =  2^ /Zoor(7o<?2(m)))-length  vectors  and  a  single 
mod{m  —  n)-length  vector  if  mod(m  —  n)  >0.  The  frst 
step  is  to  shift  the  samples  so  they  are  centered  around 
0,  mod  2V .  For  the  ,;th  block  of  samples,  we  multiply  the 
integers  in  this  block  by  2 (y  +  2  —  1).  We  also  pad  the 
777,  —  n  block  of  samples  with  2 n  —  m  0s  so  this  vector  is 
n  samples  long.  As  seen  on  the  right  hand  side  of  Figure 
4,  we  sum  these  vectors.  These  operations  are  all  highly 
elf  cient  as  they  only  involve  splitting  vectors,  multiplication 
be  two  and  bitwise  concatenation,  which  are  all  extremely 


efif  cient  to  implement.  This  result  is  the  encoded  vector  and 
has  the  property  that  Encode{inputi)  +  Encode{input2)  = 
Encode(input\  +input2).  This  encoded  data  is  subsequently 
used  for  encryption. 

Figure  5  shows  our  decoding  process.  On  the  right  hand 
side  of  this  f  gure  we  take  the  input  vector.  We  make  copies  of 
this  block  and  perform  an  integer  division  by  2(-y+2*z  —  1)  for 
the  rth  block.  We  then  concatenate  these  vectors  and  return 
the  result.  Like  for  the  encoding  operation,  these  operations 
are  all  highly  efif  cient  as  they  only  involve  splitting  vectors, 
multiplication  be  two  and  bitwise  concatenation,  which  are  all 
extremely  eff  cient  to  implement. 

IV.  Homomorphic  Encryption  and  Key  Generation 

In  this  subsection  we  describe  the  additive  homomorphic 
cryptosystem  we  use  to  construct  the  end-to-end  encrypted 
VoIP  capability  built  on  [8].  This  cryptosystem  is  very  similar 
to  the  NTRU  system  [9],  though  it  was  not  until  recently 
that  its  homomorphic  properties  were  noticed  independently 
by  Lopez-Alt  et  al.  [10]  and  Gentry  et  al.  [11].  A  more  general 
version  of  this  cryptosystem  was  discussed  in  [8],  but  we 
discuss  here  a  more  limited  version  of  the  cryptosystem  of 
[8]  which  is  simplif ed  for  more  efficient  end-to-end  VoIP 
encryption. 

The  discussion  of  this  simplif  ed  cryptosystem  has  a  high 
degree  of  overlap  with  the  more  general  cryptosystem.  Our 
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Fig.  4:  Encrypted  VoIP  Encoding 


simplif  cations  reside  primarily  in  the  encryption  and  decryp¬ 
tion  operations,  but  we  include  the  full  key  generation  and 
evaluation  addition  operations  which  are  also  modif  ed,  but  to 
a  lesser  extent,  for  the  sake  of  completeness.  The  modif  cations 
are  primarily  in  the  avoidance  of  any  ciphertext  decomposition 
to  parallelize  operations  when  the  ciphertext  modulus  q  >  264. 
We  have  found  that  we  can  parameterize  the  cryptosystem 
to  support  the  vector  addition  of  adequately  large  plaintext 
vectors  such  that  requiring  a  larger  ciphertext  modulus  is  not 
needed.  As  such,  because  we  can  limit  ourselves  to  64-bit 
operations,  our  simplif  ed  cryptosystem  can  be  implemented 
to  run  highly  efif  ciently  on  native  64-  and  32-bit  processors 
without  the  parallelism  advances  obtained  in  [8]  for  more 
eff  cient  more  general  computations. 

The  simplif  ed  cryptosystem  is  based  around  the  manipu¬ 
lation  of  power-of-2  cyclotomic  rings  for  ease  and  eff  ciency 
of  implementation.  For  ring  dimension  n  which  is  a  power  of 
2,  defne  the  ring  R  =  Z[x\/(xn  + 1)  (i.e.,  integer  polynomials 
modulo  xn  +  1).  For  a  positive  integer  q,  def  ne  the  quotient 
ring  Rq  =  R/qR  (i.e.,  integer  polynomials  modulo  xn  +  1, 
with  coeff  cients  from  Zg  =  Z/qZ). 

For  the  cryptosystem  the  message  space  is  Rp  for  some 
integer  p  >  2.  We  use  a  mod-q  Chinese  Remainder  Transform 
(CRT)  representation  of  elements  to  provide  fast  addition. 
These  CRT  representations  are  discussed  extensively  in  [12] 

The  basic  operations  of  the  scheme  are  as  follows: 


•  KeyGen:  choose  a  short  /  £  R  such  that  /  =  1  mod  p 
and  /  is  invertible  modulo  q,  and  a  short  g  £  R. 
Output  the  public  key  pk  =  h  =  g  ■  /_1  mod  q  and 
the  secret  key  sk  =  f. 

We  choose  the  short  elements  /  and  g  from  centered 
discrete  Gaussians.  E.g.,  we  can  let  /  =  p  ■  f  +  1 
for  some  Gaussian-distributed  /'.  Note  that  such  an  / 
will  have  expectation  (center)  1. 

•  Enc(pfc  =  h,  p  £  Rp):  choose  a  short  r  £  R  and  a 
short  m  £  R  such  that  m  =  p  mod  p.  Output  c  = 
p  ■  r  ■  h  +  m  mod  q. 

Concretely,  m  can  naively  be  chosen  as  m  =  p-m'+p 
for  a  Gaussian-distributed  m',  but  again,  such  an  m 
is  not  zero-centered. 

•  Dec(sfc  =  f,c  £  Rq):  compute  b  =  f  ■  c  mod  q,  and 
lift  it  to  the  integer  polynomial  b  £  R  with  coeff  cients 
in  [—q/2,q/2).  Output  p  =  b  mod  p. 

The  additive  homomorphic  operations  are  def  ned  as  follows: 

•  Eva  I  Add  (co,  Ci):  output  c  =  cq  +  Ci  mod  q. 

V.  Parameter  Selection  Tradeoffs 

We  need  to  choose  parameters  for  both  the  vocoder  and 
the  cryptosystem  so  that: 

•  VoIP  signal  data  is  encoded  into  VoIP  plaintext. 
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Fig.  5:  Encrypted  VoIP  Decoding 


•  The  VoIP  plaintext  can  be  securely  encrypted  into 
VoIP  ciphertext. 

•  The  summation  of  multiple  VoIP  ciphertexts  can  be 
successfully  decrypted  back  into  VoIP  plaintext. 

•  The  output  VoIP  plaintext  can  be  decoded  into  an 
undistorted  VoIP  signal. 

•  All  these  operations  need  to  be  run  efif ciently  on 
commodity  hardware,  such  as  64-  and  32-bit  ARM 
and  x86  processors. 

Generally,  these  concerns  mean  that: 

•  The  bitwidth  of  the  VoIP  data  P  needs  to  be  stiff - 
ciently  large  so  that  given  a  VoIP  integer  signal  vectors 
from  y  speakers  v\,  v2,  ■  ■  ■ ,  vy,  we  are  guaranteed  that 

v\  +  v2  +  ■  ■  ■  +  vy  =  (ui  +  v2-\ - +  vymodP). 

•  The  number  of  layers  in  the  encodings  (and  hence 
the  ring  dimension  n)  and  the  plaintext  modulus 
p  =  2X  need  to  be  suff  ciently  large  with  respect 
to  P  so  that  for  the  encodings  z\ ,  z2 , . . . ,  zy  where 

z%  =  encode(vi),  z\  +  z2-\ - b zv  =  (zi  +  z2  +  ■  ■  ■  + 

Zyinodp). 

•  The  ciphertext  modulus  needs  to  be  suff  ciently  small 
that  we  can  support  computations  on  the  ciphertext 
efif  ciently.  For  modern  smart  phones  this  means  that 


the  ciphertext  modulus  is  at  most  264,  so  we  can  use 
native  64-bit  computations. 

•  The  selection  of  parameters  needs  to  provide  a  non¬ 
trivial  root  Hermite  factor  to  provide  security  guaran¬ 
tees. 

The  selection  of  the  ring  dimension  n  and  ciphertext 
modulus  q  parameters  depends  heavily  on  the  desired  security 
level  and  the  plaintext  modulus  p.  The  plaintext  modulus  p 
depends  on  the  VoIP  data  modulus  P,  the  number  of  VoIP 
streams  that  need  to  be  mixed  without  distortion  y  and  the  VoIP 
data  bitwidth  P.  We  capture  the  primary  concerns  inf  uencing 
the  selection  of  a  ring  dimension  n  and  the  modulus  q  at  a 
high  level  as  follows: 

We  choose  to  add  discrete  Gaussian  noise  to  the  fresh 
ciphertexts  where  r  =  3  represents  the  selected  probability 
distribution  parameter  as  suggested  in  [8],  We  have  found 
theoretically  that  the  smallest  modulus  q  needs  to  satisfy  the 
expression 

q  >  Apry/nw  (1) 

in  order  to  ensure  successful  decryption,  where  the  param¬ 
eter  w  ~  4  represents  an  “assurance”  measure  for  correct 
decryption  (essentially,  the  probability  of  decryption  failure  is 
bounded  by  the  probability  that  a  normally  distributed  variable 
is  more  than  w\/2n  standard  deviations  from  its  mean),  and  p-r 
is  the  Gaussian  parameter  of  the  noise  used  in  fresh  ciphertexts. 
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(Hence  r  is  the  Gaussian  parameter  of  the  underlying  NTRU- 
like  problem.) 

The  most  recent  experimental  evidence  [13]  suggests  that 
S  =  1.007  would  require  roughly  240  core-years  on  recent 
Intel  Xeon  processors  to  break.  Using  the  estimates  from  [14], 
[15],  we  found  that  in  order  to  achieve  a  security  level  S  for  a 
depth  of  computation  d  =  t  —  1  using  the  modulus  q,  we  need 
to  ensure  that 

n>lg(g)/(41g(J)).  (2) 

VI.  Implementation 

We  evaluated  our  end-to-end  encrypted  VoIP  capability 
by  implementing  our  vocoder  and  homomorphic  encryption 
library  and  then  integrating  them  with  an  existing  open-source 
VoIP  teleconferencing  capability.  This  activity  resulted  in  an 
end-to-end  encrypted  VoIP  client  for  teleconferencing  clients 
running  in  an  Apple  iOS  enviromnent  composed  of  a)  the 
open-source  Mumble  VoIP  client  modif  ed  integrated  with  b) 
a  custom  linear  codec  of  our  design  written  in  ANSI  C  and  c) 
an  FHE  encryption  library  ported  from  Matlab  to  ANSI  C.  We 
also  wrote  and  deployed  the  VoIP  server  capability  running  on 
Linux  computing  devices  to  perform  the  homomorphic  mixing 
operation.  We  describe  the  implementation  of  this  capability 
in  this  section. 

A.  Codec  and  Homomorphic  Encryption  Implementation 

As  with  the  cryptosystem  design,  our  implementation  used 
for  an  additive  homomorphic  encryption  library  is  a  customiza¬ 
tion  of  the  design  introduced  in  [8].  We  implemented  our 
scheme  in  the  Mathworks  Matlab  environment  and  used  the 
Matlab  coder  toolkit  [16]  to  generate  an  ANSI  C  library  of 
our  implementation.  We  believe  that  additional  performance 
improvements  could  be  obtained  by  implementing  our  HE 
scheme  natively  in  C. 

We  chose  to  implement  our  scheme  in  Matlab  using  the 
Matlab  f  xed-point  toolbox  because  it  provides  an  interpreted 
computation  environment  for  rapid  prototyping  with  native 
support  for  vector  and  matrix  manipulation  which  simplif  es 
implementation  development.  We  found  the  Matlab  syntax  to 
be  a  natural  f  t  for  writing  software  to  support  the  primitive 
lattice  operations  needed  for  our  CRT-based  NTRU-inspired 
homomorphic  encryption  design.  The  Matlab  f  xed-point  tool¬ 
box  also  provides  a  path  toward  generated  HDL  implemen¬ 
tations  of  our  design  that  can  be  deployed  for  practical  use 
on  highly  parallel  computing  hardware  such  as  FPGAs.  Part 
of  our  vision  for  the  use  of  our  SHE  design  is  to  develop  an 
FPGA  implementation  of  FHE  [17],  [18]. 

We  implemented  the  vocoder  capability  in  native  ANSI 
C.  We  compiled  this  capability  using  the  gee  tool  to  create  a 
vocoder  library  which  we  then  integrated  with  the  homomor¬ 
phic  encryption  library  and  a  VoIP  teleconferencing  substrate. 

B.  VoIP  Teleconferencing  Substrate 

Rather  than  construct  a  VoIP  capability  from  whole  clothe, 
we  decided  to  construct  an  end-to-end  encrypted  VoIP  telecon¬ 
ferencing  capability  by  integrating  our  additive  homomorphic 
encryption  library  and  our  vocoder  library  with  an  existing 
open-source  VoIP  teleconferencing  library.  We  selected  the 


Mumble  VoIP  library  (http://mumble.sourceforge.net)  for  this 
integration  because  the  Mumble  is  mature,  offers  high  sound 
quality  and  runs  on  a  variety  of  platforms. 

We  decided  to  implement  our  end-to-end  encrypted  VoIP 
teleconferencing  capability  for  iOS  clients  because  the  native 
iOS  development  environment  uses  Objective  C,  a  dialect  of 
ANSI  C.  However,  even  though  we  only  developed  iOS  clients, 
there  is  no  reason  our  client  library  could  not  be  integrated  in 
other  environments  such  as  for  Android,  Windows,  Mac  or 
Blackberry  clients. 

By  integrating  with  the  Mumble  library,  our  end-to-end 
encrypted  VoIP  library  has  the  same  use  and  deploy  models 
as  the  standard  Mumble  capability.  Notably,  Mumble  clients 
present  the  user  a  simple,  easy  to  use,  graphical  user  interface 
that  can  be  easily  understood  with  minimal  training.  An  image 
of  the  modif  ed  client  running  on  an  iPod  Touch  can  be  seen 
in  Figure  6  where  the  client  is  running  in  push-to-talk  mode. 
This  client  is  indistinguishable  from  the  standard  iOS  Mumble 
client.  The  Mumble  software  can  also  be  deployed  through  an 
app  store  model,  or  as  binaries  which  can  be  loaded  onto  iOS 
devices  through  XCode. 


Fig.  6:  The  Push-To-Talk  Client  GUI 


We  integrated  the  iOS  capability  so  that  client  handsets 
encrypt  their  audio  streams  using  the  clients  public  key. 
The  proxy  server  computes  over  that  encrypted  data  without 
decrypting  the  data  or  sharing  keys.  The  output  of  the  pro¬ 
cessing  is  sent  to  the  client,  where  it  is  decrypted  using  the 
clients  private  key.  No  keys  are  stored  on  the  teleconference 
server,  so  privacy  is  preserved  even  if  an  adversary  views  all 
communication  links  and  operations  on  the  server. 

This  integration  was  relatively  straight  forward  with  several 
notable  exceptions  to  reduce  packet  drops  and  improve  sound 
quality: 
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1)  The  client  application  generated  voice  packets  that 
contained  480  samples  at  48  KHz,  or  10  ms  worth  of 
sound.  The  sound  driver,  however,  generated  slightly 
larger  packets.  As  a  result,  the  period  of  the  sound 
packets  was  slightly  larger  than  10  ms,  and  every 
so  often  two  sound  packets  were  generated  back  to 
back.  The  original  server  set  a  10  ms  timer  and  just 
accepted  one  packet  every  10  ms.  We  added  a  small 
queue  at  the  server  so  we  did  not  drop  packets  when 
we  received  two  packets  in  a  row  very  quickly. 

2)  We  generated  new  frame  numbers  at  the  server  as 
opposed  to  re-using  the  client  frame  numbers.  The 
clients  correlated  the  frame  numbers  with  time.  This 
cut  down  on  the  time  jitter  with  regard  to  frame 
numbers. 

3)  The  encryption  and  decryption  operations  for  our 
applications  were  processor  intensive,  and  were  run 
in  batches  of  several  audio  packets  at  once.  We 
moved  the  encryption  and  decryption  operations  to 
a  low  priority  thread  and  had  the  higher  priority 
thread  accept  and  queue  new  audio  packets  (both 
from  the  network,  and  from  the  microphone).  This 
helped  prevent  a  situation  where  we  audio  packets 
were  dropped  because  we  were  too  busy  decrypting 
or  encrypting. 

The  goal  of  our  changes  were  to  reduce  the  drop  rate  of 
packets  which  was  an  issue  with  initial  prototypes.  This,  in 
turn,  allowed  us  to  increase  audio  sampling  rate.  As  a  result, 
we  were  sampling  10  bit  samples  at  a  rate  of  48kHz.  This 
conf  guration  provides  a  sound  quality  substantially  better  than 
PSTN  as  long  as  there  are  only  a  few  packet  drops. 

After  sampling  the  audio,  we  queue  and  encode  90ms 
blocks  of  this  data  into  our  encoder.  We  designed  the  system 
to  accommodate  4  speakers,  resulting  in  the  homomorphic 
mixer  to  add  4  10-bit  integers  homomorphically,  resulting 
in  a  12-bit  plaintext  without  the  encoding  layering.  If  we 
use  a  ring  dimension  n  =  1024,  we  are  required  to  use 
2-layer  encoding  and  have  a  resulting  plaintext  modulus  of 
p  =  224  =  16777216.  This  encoding  and  encryption  results  in 
a  root  Hermite  factor  of  <5  =  1.006  which  is  currently  believed 
to  be  at  least  as  secure  as  AES-128.  With  these  parameter 
settings  we  observed  that  when  running  on  an  iPhone  5s,  the 
encoding  and  encryption  operation  took  a  mean  time  of  9.2ms 
and  decryption  and  decoding  took  4.6ms.  The  summation  on 
the  VoIP  server  took  0.5ms.  Transport  of  encrypted  VoIP  traff  c 
from  Cambridge  MA  to  the  Northern  Virginia  Amazon  AWS 
servers  took  an  average  of  15ms.  This  results  in  a  mean  latency 
much  less  than  our  100ms  threshold  for  VoIP  traff  c,  well 
within  the  bounds  of  reasonable,  both  in  theory  and  in  practice. 

VII.  Experimental  Results 

We  experimentally  evaluated  the  performance  of  the  VoIP 
service  by  deploying  our  encrypted  VoIP  servers  in  each  of 
the  Amazon  AWS  data  centers  across  the  world.  We  then 
connected  iPod  Touch  clients  to  each  of  the  servers  through 
various  connection  types  in  the  metro  area  of  a  United  States 
city  in  southern  New  England.  These  connections  included 
802.1  In  wireless  enterprise  gateway  connected  to  a  high¬ 
speed  enterprise  Internet  connection,  the  4G  LTE,  3G  and  2G 


connections  over  the  T-mobile  commercial  wireless  service  and 
an  AT&T  DSL  connection  in  a  rural  area  outside  the  city. 

We  measured  the  upload  and  download  throughput  of  the 
connections,  the  drop  rate  of  VoIP  packets  routed  through 
the  various  server  locations  and  the  subjective  quality  of  the 
VoIP  teleconference  session  as  def  ned  by  the  experimenters. 
The  upload  and  download  throughput  was  measured  by  Ookla 
throughput  measurement  app  [19]  on  the  client  devices.  VoIP 
drop  rates  were  measured  experimentally  by  modifying  the 
VoIP  servers  to  measure  drop  rates.  Voice  quality  was  mea¬ 
sured  in  comparison  to  PSTN  voice  quality  where  “Excellent” 
means  the  VoIP  conversation  was  better  than  PSTN,  “Good” 
means  the  VoIP  conversation  was  comparable  PSTN,  “Poor” 
means  the  VoIP  conversation  was  worse  than  PSTN  but  still 
usable  for  communication,  and  “Unusable”  means  the  connec¬ 
tion  was  useless  for  communication. 

All  of  the  experiments  were  run  over  a  2  hour  period  on 
a  weekday  evening  using  2  iPod  Touch  clients  with  servers 
deployed  on  the  Amazon  AWS  tl. micro  instances  [20].  Each  of 
the  clients  were  on  independent  connections  to  the  Internet  at 
all  times,  so  there  was  low  likelihood  of  one  client  contributing 
substantially  to  congestion  for  the  other  client. 

Table  I  shows  the  upload  and  download  throughput  ob¬ 
served  by  each  of  the  clients  for  each  of  the  connections.  Note 
that  the  rural  DSL  service  provided  better  throughput  than  the 
2G  connection  and  better  download  throughput  than  the  3G 
connection. 

TABLE  I:  Experimentally  Measured  Data  Throughput  in  Mb/s 
for  Connection  Types 


Connection  Type 

Upload  Rate  Mb/s 

Download  Rate  Mb/s 

Enterprise  802.1  In 

38.22 

36.53 

4G  LTE 

35.82 

17 

3G 

6.31 

0.43 

2G 

0.2 

0.16 

Rural  DSL 

2.55 

0.47 

Table  II  shows  the  packet  drop  rates  observed  at  each  of  the 
servers  at  the  various  Amazon  AWS  locations  for  the  various 
client  connection  types.  Note  that  distance  between  the  client 
and  server  had  only  a  minor  impact  on  drop  rates,  while  the 
connection  type  had  a  very  large  impact  on  drop  rates.  This 
implies  that  the  connection  could  be  a  bottleneck  for  the  VoIP 
service. 

Table  III  shows  the  subjective  VoIP  teleconference  quality 
measurements  observed  through  each  of  the  servers  at  the 
various  Amazon  AWS  locations  for  the  various  client  con¬ 
nection  types.  Note  that  distance  between  the  client  and  server 
had  almost  no  observed  impact  on  voice  quality,  while  the 
connection  type  had  a  very  large  impact  on  voice  quality. 

We  observed  that  all  of  the  various  connections  supported 
acceptable  VoIP  teleconference  capabilities  except  for  the  2G 
connections.  Over  all  of  the  acceptable  connections,  the  lowest 
upload  or  download  throughput  observation  was  on  the  3G 
download:  0.43Mb/s  Because  the  VoIP  download  and  upload 
data  fows  are  symmetric,  this  implies  at  least  a  0.43Mb/s 
upload  and  download  throughput  connection  is  required  to 
support  VoIP  teleconferencing  using  our  prototype. 
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TABLE  II:  Packet  Drop  Rates  For  Various  Server  Locations  and  Client  Internet  Connection  Types. 


Server  Location 

Client  Location 

Enterprise  802.1  In 

4G  LTE 

3G 

2G 

Rural  DSL 

N.  Virginia 

S.  New  England 

0% 

10% 

10% 

66% 

33% 

Oregon 

S.  New  England 

0% 

2% 

3% 

71% 

35% 

N.  California 

S.  New  England 

0% 

7% 

8% 

67% 

34% 

Ireland 

S.  New  England 

0% 

7% 

7% 

73% 

38% 

Singapore 

S.  New  England 

5% 

2% 

2% 

68% 

39% 

Tokyo 

S.  New  England 

1% 

3% 

4% 

69% 

37% 

Sydney 

S.  New  England 

5% 

3% 

3% 

67% 

34% 

Sao  Paulo 

S.  New  England 

0.30% 

4% 

6% 

76% 

34% 

TABLE  III:  Teleconference  Quality  For  Various  Server  Locations  and  Client  Internet  Connection  Types. 


Server  Location 

Client  Location 

Enterprise  802.1  In 

4G  LTE 

3G 

2G 

Rural  DSL 

N.  Virginia 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Oregon 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

N.  California 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Ireland 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Singapore 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Tokyo 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Sydney 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

Sao  Paulo 

S.  New  England 

Excellent 

Good 

Good 

Unusable 

Poor 

In  addition  to  our  tests  of  connection-server  pairings,  we 
also  tested  the  scalability  of  the  number  of  clients  that  could  be 
supported  on  a  single  server.  For  this  experiment  we  connected 
7  iPod  Touch  and/or  iPhone  5s  clients  at  various  connections 
on  the  eastern  United  States  seaboard  to  a  single  VoIP  server 
in  the  Amazon  AWS  Northern  Virginia  data  center.  With  these 
7  connections  running  simultaneously  with  4  people  speaking 
simultaneously  we  were  able  to  hold  as  good  as  a  conversation 
possible  with  4  people  speaking  simultaneously  and  no  voice 
distortion  was  observed  by  the  3  non-speaking  client  users. 

VIII.  Related  Work 

Up  to  now,  advances  in  secure  VoIP  technologies  have 
focused  on  providing  security  for  data  in  transit  [1],  [2] 
among  other  general  security  challenges  such  as  DDoS  attacks 
[21],  identity  and  key  management  [22]  among  many  others 
[23],  [24].  These  are  all  important  challenges  for  secure 
VoIP  teleconferencing  capabilities,  but  a  reliance  on  point- 
to-point  encryption  between  participants  has  too  often  led  to 
complicated  VoIP  teleconferencing  systems  and  protocols  [25]. 
In  general,  the  complicated  layering  of  protection  mechanisms 
is  often  diffcult  to  execute  in  practice,  leading  to  overly 
complicated  systems  which  are  difif  cult  to  build  and  maintain. 
Further,  these  complicated  systems  are  often  difif  cult  to  per¬ 
form  security  audits  on  [26]-[28].  Although  all  of  the  partial 
security  solutions  have  worked  very  well  in  isolation  and  have 
served  their  purposes  as  a  rule,  the  at  time  complicated  layering 
of  these  protocols  has  resulted  in  the  introduction  of  possible 
security  holes  which  has  enabled  data  leakage. 

To  the  best  of  our  understanding,  there  have  been  no 
VoIP  technologies  which  provides  end-to-end  encryption.  Our 
solution  seeks  to  provide  a  clean-slate  data  protection  capa¬ 
bility  that  is  also  compatible,  or  at  least  easily  integrated 
with  existing  VoIP  protocols  and  architectures.  Because  we 
provide  end-to-end  data  encryption,  our  solution  protects  data 
against  leakage  even  when  layered  with  existing  VoIP  pro¬ 
tocols  for  signaling  and  transport.  Besides  providing  security 
against  data  leakage  due  to  compromised  servers,  end-to-end 
encrypted  VoIP  teleconferencing  has  the  possibility  for  greatly 


simplifying  existing  VoIP  protocols,  resulting  in  much  simpler 
implementations  and  designs,  thus  resulting  in  more  efficient 
VoIP  implementations  that  are  easier  to  audit. 

The  basis  of  our  design  and  implemented  prototype  for  end- 
to-end  encrypted  VoIP  teleconferencing  is  driven  by  and  builds 
on  recent  recent  breakthroughs  in  practical  Fully  Homomor¬ 
phic  Encryption  (FHE).  Recent  breakthroughs  in  Homomor¬ 
phic  Encryption  have  shown  that  it  is  theoretically  possible  to 
securely  run  arbitrary  computations  over  encrypted  data  with¬ 
out  decrypting  the  data  [29],  [30].  There  has  been  recent  work 
on  designing  and  implementing  variations  of  homomorphic 
encryption  schemes  [10],  [3 1]— [39].  These  implementations 
have  become  increasingly  practical  with  published  results  on 
both  the  runtime  of  isolated  secure  computing  operations 
for  some  implementation  [34],  [37],  [38]  and  evaluations  of 
composite  functions  like  AES  [33],  [36],  [39]. 

Current  approaches  to  design  FHE  schemes  rely  on  a 
special,  highly  complex  and  computationally  diff  cult  operation 
called  bootstrapping  [40]  to  support  the  encrypted  execution 
of  arbitrary  functions.  As  such,  we  use  a  simplif  cation  of  the 
general  FHE  designs  called  ’’leveled”  homomorphic  encryption 
or  Somewhat  Homomorphic  Encryption  (SHE)  the  supports 
limited-depth  computations,  such  as  vector  addition,  which  is 
much  more  efif  cient  because  it  does  not  require  the  use  of 
bootstrapping. 

Besides  the  runtime  challenges  of  HE  designs,  there  are 
serious  applications  issues  associated  with  data  structures 
and  representations  [39].  Furthermore,  it  has  not  been  well 
explored  how  to  convert  existing  data  structures  and  algo¬ 
rithms  into  forms  that  can  be  eff  ciently  executed  using  FHE 
technologies.  This  is  because  FHE  provides  a  very  different 
computation  model  from  existing  RAM  computing  devices  and 
the  porting  of  known  data  structures  and  algorithms  (such  as 
for  VoIP  mixing)  is  non-trivial,  especially  for  highly  eff  cient 
encrypted  execution  of  these  algorithms  over  the  encrypted 
input  data.  As  an  example  of  limitations,  early  uses  of  FHE 
relied  on  encrypting  individual  bits  in  ciphertext.  These  limita¬ 
tions,  in  addition  to  the  inherent  computational  cost  of  secure 
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computing  using  known  FHE  schemes,  has  until  now  prevented 
the  practical  use  of  FHE.  Our  innovation  comes  from  designing 
a  set  of  data  structures,  data  encoding  method  (which  we  refer 
to  as  a  vocoder)  and  a  homomorphic  mixing  operation  which 
supports  a  practical  implementation  of  end-to-end  encrypted 
VoIP  teleconferencing. 

In  particular,  a  key  innovation  of  ours  is  to  go  beyond 
simple  bit-per-ciphertext  encodings  by  placing  entire  VoIP 
data  frames  into  each  ciphertext.  These  codec  designs  are 
in  some  sense  much  simpler  than  existing  modern  codecs, 
such  as  the  mu-law  encoders  [41]  which  are  much  more 
common  in  modern  VoIP  systems.  There  have  been  prior 
known  approaches  to  Additive  Homomorphic  Encryption,  such 
as  Paillier  encryption  [42],  but  these  approaches  have  not 
been  practically  employed  to  support  encrypted  VoIP  mixing. 
Further,  there  has  been  no  prior  work  that  has  investigated  the 
data  structures  required  to  support  end-to-end  encrypted  VoIP 
teleconferencing  with  homomorphic  mixing. 

There  have  been  few  other  approaches  to  providing  secure 
VoIP  teleconferencing  that  approach  to  providing  security 
properties  such  as  end-to-end  encryption.  Most  relevant  is  the 
work  in  [43]  which  discusses  a  VoIP  teleconferencing  approach 
based  on  Secure  Multi-Party  Computation  (SMC)  [44].  This 
prior  SMC -based  approach  is  also  built  by  modifying  the 
Mumble/Murmur  software  and  our  team  received  implemen¬ 
tation  advice  from  the  authors  of  [43].  Unlike  our  HE-based 
approach  which  requires  only  one  untrusted  server  for  end-to- 
end  encrypted  VoIP  teleconferencing,  the  MPC-based  approach 
in  [43]  requires  that  every  participant  in  the  teleconference 
have  at  least  one  trusted  server. 

IX.  Discussion  and  Ongoing  Work 

Given  our  initial  design  goals,  our  implementation  and  our 
experimentation,  our  assessment  is  that  we  met  our  initial 
design  criteria.  More  specif  cally  with  respect  to  the  previously 
discussed  design  goals: 

1)  Sound  Quality:  The  end-to-end  encrypted  VoIP  tele¬ 
conferencing  capability  provide  sound  quality  at  least 
as  good  as  a  Public  Switched  Telephone  Network 
(PSTN)  with  full-duplex  capability. 

2)  Latency:  The  end-to-end  encrypted  VoIP  teleconfer¬ 
encing  capability  provides  an  end-to-end  latency  of 
less  than  90ms  for  trans-continental  conversations, 
better  than  our  design  goal. 

3)  Scalability:  The  end-to-end  encrypted  VoIP  telecon¬ 
ferencing  capability  should  be  able  to  supports  four 
people  speaking  simultaneously  and  we  have  exper¬ 
imentally  verif  ed  that  participants  can  listen  to  the 
audio  stream  without  a  performance  impact. 

4)  Secure:  The  end-to-end  encrypted  VoIP  teleconfer¬ 
encing  capability  provides  a  remote  Hermite  factor 
of  1.006  which  provides  an  encryption  work  factor 
at  least  as  good  as  the  work  factor  for  AES-128 
to  the  best  current  understanding  of  both  of  these 
cryptosystems. 

5)  Resource  Efficient:  The  ciphertext  expansion  of  our 
encrypted  VoIP  data  is  roughly  at  a  factor  of  5.  This  is 
highly  efif  cient  as  compared  to  many  other  encryption 
schemes,  especially  current  homomorphic  encryption 
schemes. 


6)  Wide  Geographic  Area:  We  have  tested  the  end- 
to-end  encrypted  VoIP  teleconferencing  capability 
with  speakers  in  multiple  eastern  US  states  (Virginia 
and  Massachusetts)  and  with  the  server  running  in 
the  Amazon  AWS  cloud  in  Northern  Virginia.  We 
have  also  tested  in  a  similar  scenario  with  the  VoIP 
server  in  many  Amazon  AWS  locations  and  clients 
on  the  eastern  seaboard  of  the  United  States  for 
various  connection  types.  In  all  of  these  situations 
performance  was  not  compromised  due  to  geography 
and  the  main  determinant  of  quality  was  connection 
throughput. 

7)  Portable:  The  end-to-end  encrypted  VoIP  teleconfer¬ 
encing  capability  is  easily  ported  to  other  client  and 
server  types.  There  is  no  reason  our  ANSI  C  libraries 
could  not  be  used  to  support  integration  with  other 
VoIP  infrastructure,  or  even  other  kinds  of  clients 
such  as  Android  clients. 

8)  Easily  Deployable:  The  end-to-end  encrypted  VoIP 
teleconferencing  capability  is  easy  to  deploy,  at  least 
as  easy  as  an  iOS  app. 

9)  Usable:  The  end-to-end  encrypted  VoIP  teleconfer¬ 
encing  capability  is  intuitive  and  easy  to  use  as  it 
build  on  the  prior  usability  of  the  Mumble  GUI. 

10)  Extensible:  The  end-to-end  encrypted  VoIP  telecon¬ 
ferencing  capability  is  relatively  easy  to  modify  to 
add  additional  and  more  advanced  functionality  at  a 
later  date,  inclusive  of  QoS  management,  encrypted 
text  message  passing,  amongst  other  capabilities. 

Beyond  our  VoIP  functionality,  our  HE  implementation  is 
part  of  a  long-term  community  vision  to  support  a  general, 
practical  and  secure  computing  capability  through  a  layered 
services  architecture.  Part  of  our  vision  is  to  provide  software 
interfaces  in  our  design  for  our  highly  optimized  implementa¬ 
tions  of  the  basic  FHE  operations  (KeyGen,  Encrypt,  EvalAdd, 
EvalMult,  Decrypt)  for  both  general  and  specif  c  applications. 

Although  we  only  utilize  limited-depth  Somewhat  and 
Partially  Homomorphic  Encryption  capabilities,  our  encryption 
system  design  is  a  scaled-down  version  of  a  Fully  Homo¬ 
morphic  Encryption  (FHE)  scheme  design.  When  used  in 
conjunction  of  a  variation  of  a  not  previously  implemented 
bootstrapping  scheme  [45]  simplif  ed  for  power-of-2  rings,  our 
design  offers  the  possibility  for  a  much  more  general  VoIP 
teleconferencing  capability  that  incorporates  signal  detection 
and  noise  f  Itering  operations  on  the  encrypted  VoIP  channels. 
This  more  general  design  would  enable  protection  against 
some  of  the  more  practical  attacks  that  could  be  made  by  an 
adversary  such  as  noise  injection  attacks  where  an  adversary 
inserts  noise  into  a  VoIP  teleconferencing  session  to  reduce  the 
ability  of  participants  to  hear  one  another.  Using  more  general 
FHE  capabilities,  we  could  enable  the  untrusted  cloud  host 
to  securely  flter  the  encrypted  VoIP  signals  before  or  after 
mixing  to  reduce  the  impacts  of  insertion  attacks. 

A  further  aspect  of  our  layered  architecture  vision  is  an 
ability  to  mix-and-match  a  computing  substrate  at  the  server 
for  much  larger  scalability  and  throughput.  Although  not 
an  immediate  focus  of  the  results  reported  here,  our  FHE 
design  ports  to  other,  high-performance  and  low-cost  parallel 
computing  environments  such  as  FPGAs  [18]  and  GPUs  [46] 
operating  as  “FHE  co-processors”.  If  ported  to  a  dedicated 
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FPGA  co-processor,  the  runtime  of  our  underlying  SHE/FHE 
implementation  can  be  greatly  improved  upon  as  compared 
to  the  runtime  of  the  corresponding  interpreted  CPU-only 
implementation  which  we  discuss  herein. 

Taken  together,  we  see  our  design  and  experimentation  with 
our  end-to-end  encrypted  VoIP  teleconferencing  capability 
as  being  a  highly  practical  and  extensible  implementation 
that  protects  VoIP  teleconferencing  users  against  data  leakage 
through  a  very  simple  but  highly  secure  design.  Our  primary 
path  forward  is  to  add  more  functionality,  protection  against 
other  kinds  of  attacks  and  increasingly  leverage  the  inherent 
parallelism  of  our  design  at  multiple  levels  of  our  implemen¬ 
tation. 
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Abstract 

In  this  paper  we  report  on  our  work  to  design,  implement  and  evaluate  a  Fully  Homomorphic 
Encryption  (FHE)  scheme.  Our  FHE  scheme  is  an  NTRU-like  cryptosystem,  with  additional  support 
for  efficient  key  switching  and  modulus  reduction  operations  to  reduce  the  frequency  of  bootstrapping 
operations.  Ciphertexts  in  our  scheme  are  represented  as  matrices  of  64-bit  integers.  The  basis  of  our 
design  is  a  layered  software  services  stack  to  provide  high-level  FHE  operations  supported  by  lower- 
level  lattice-based  primitive  implementations  running  on  a  computing  substrate.  We  implement 
and  evaluate  our  FHE  scheme  to  run  on  a  commodity  CPU-based  computing  environment.  We 
implemented  our  FHE  scheme  to  run  in  a  compiled  C  environment  and  use  parallelism  to  take 
advantage  of  multi-core  processors.  We  provide  experimental  results  which  show  that  our  FHE 
implementation  provides  at  least  an  order  of  magnitude  improvement  in  runtime  as  compared  to 
recent  publicly  known  evaluation  results  of  other  FHE  software  implementations. 


1  Introduction 

Recent  breakthroughs  in  Homomorphic  Encryption  have  shown  that  it  is  theoretically  possible  to  securely 
run  arbitrary  computations  over  encrypted  data  without  decrypting  the  data  mm-  There  has  been 
recent  work  on  designing  and  implementing  variations  of  Somewhat  Homomorphic  Encryption  (SHE)  and 
Fully  Homomorphic  Encryption  (FHE)  schemes  [21  [HI  IH1  HI  EH  HU  HE1 H21  HI  H]  ■  These  implementations 
have  become  increasingly  practical  with  published  results  on  both  the  runtime  of  isolated  EvalAdd  and 
EvalMult  operations  for  some  implementation  mmm  and  evaluations  of  composite  functions  like 
AES  PHI  HU. 

Current  approaches  to  design  FHE  schemes  rely  on  bootstrapping  to  arbitrarily  increase  the  size  of 
computation  supported  by  an  underlying  SHE  scheme.  Many  current  implementations  of  SHE  and  FHE 
schemes  rely  on  the  the  manipulation  of  very  large  integers  so  that  the  schemes  are  both  secure  and 
capable  of  supporting  the  evaluation  of  sufficiently  large  circuits.  Prior  SHE  and  FHE  implementation 
designs  mini  HU  HI,  for  the  most  part,  rely  on  single-threaded  execution  on  commodity  CPU-type 
hardware,  partially  due  to  the  difficulty  of  or  lack  of  native  support  for  multi-threaded  execution  with 
underlying  software  libraries  [201  EH-  This,  in  addition  to  the  inherent  computational  cost  of  secure 
computing  using  known  SHE  and  FHE  schemes,  prevented  the  practical  use  of  SHE  and  FHE. 

’Sponsored  by  the  Defense  Advanced  Research  Projects  Agency  (DARPA)  and  the  Air  Force  Research  Laboratory 
(AFRL)  under  Contract  No.  FA8750-11-C-0098.  The  views  expressed  are  those  of  the  authors  and  do  not  necessarily  reflect 
the  official  policy  or  position  of  the  Department  of  Defense  or  the  U.S.  Government.  Distribution  Statement  “A”  (Approved 
for  Public  Release,  Distribution  Unlimited.) 
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In  this  paper  we  report  on  our  work  to  design,  implement  and  evaluate  a  scalable  Fully  Homomor¬ 
phic  Encryption  (FHE)  scheme  which  addresses  the  limitations  for  secure  arbitrary  computation.  Our 
implementation  uses  a  variation  of  a  not  previously  implemented  bootstrapping  scheme  [T]  simplified  for 
power-of-2  rings.  We  also  use  a  “double-CRT”  representation  of  ciphertexts  which  was  also  discussed 
in  m-  With  this  double-CRT  representation,  we  can  select  parameters  so  that  ciphertexts  are  secure 
when  represented  as  matrices  of  64-bit  integers,  but  still  support  the  secure  execution  of  programs  on 
commodity  computing  device  without  expending  unnecessary  computational  overhead  manipulating  large 
multi-hundred-bit  or  even  multi-thousand-bit  integers. 

We  implement  in  software  specialized  lattice  primitives  such  as  Ring  Addition,  Ring  Multiplication 
and  the  Chinese  Remainder  Transform  (CRT).  We  use  our  primitive  implementations  to  construct  the 
FHE  operations  of  Key  Generation  (KeyGen),  Encryption  (Enc),  Decryption  (Dec),  Evaluation  Addition 
(EvalAdd),  Evaluation  Multiplication  (EvalMult)  and  Bootstrapping  (Boot).  We  use  supporting  Modu¬ 
lus  Reduction  (ModReduce),  Ring  Reduction  (RingReduce)  and  Key  Switching  (KeySwitclr)  operations 
to  augment  the  EvalMult  operation  and  support  larger  depth  computations  without  bootstrapping  or 
decreasing  the  security  of  our  scheme. 

We  implemented  this  scheme  to  run  in  a  compiled  C  environment  and  use  parallelism  to  take  ad¬ 
vantage  of  multi-core  processors.  Taken  together,  our  implementation  of  these  concepts  points  the  way 
to  a  practical  implementation  of  FHE  with  a  more  efficient  (and  less  frequent)  use  of  the  bootstrapping 
operation.  We  evaluate  the  performance  of  our  software  library  as  a  set  of  compiled  executables  in  a  com¬ 
modity  CPU-based  multi-core  Linux  environment.  The  evaluated  performance  of  our  library  compares 
favorably  with  evaluations  of  the  reported  experimental  CPU-based  evaluation  results  of  other  recent 
SHE  and  FHE  schemes  implemented  in  software  such  as  in  mmm- 

This  paper  is  organized  as  follows.  In  Section  [2]  we  discuss  how  we  represent  ciphertexts  in  our 
implementation.  In  Section[3]we  define  our  NTRU-based  FHE  scheme.  In  Section0]we  discuss  parameter 
selection  for  our  NTRU-based  scheme  to  provide  practical  secure  computing  on  commodity  computing 
hardware.  In  Section[5]we  discuss  our  experimental  results  from  our  FHE  scheme  implemented  in  Matlab. 
We  conclude  the  paper  with  a  discussion  of  our  insights  and  next  steps  in  Section  [6j  Data  tables 
experimental  runtime  results  can  be  seen  in  Appendix  [A] 


2  Double-CRT  Ciphertext  Representation 

Previous  SHE/FHE  designs  and  implementations  use  two  primary  parameters  to  tune  the  security  pro¬ 
vided  and  the  supported  depth  of  homomorphic  computation  (without  resorting  to  bootstrapping):  the 
ring  dimension  n  and  the  ciphertext  modulus  q.  With  these  parameters,  fresh  ciphertexts  are  typically 
represented  as  n-element  integer  arrays,  where  each  array  element  consists  of  at  least  log2(g)  bits.  In 
previous  implementations  the  ring  dimension  n  typically  ranged  from  512  (29)  to  16384  (214)  and  beyond, 
while  several  hundred  to  several  thousand  bits  was  typically  required  to  represent  q.  In  the  previous  im¬ 
plementations  that  use  this  “large-g”  approach,  the  practicality  challenge  derives  from  the  difficulty  of 
supporting  both  a  large  ring  dimension  n  (which  provides  comparatively  better  security)  and  a  large  q 
(which  increases  the  depth  of  computation  supported). 

The  requirement  of  a  very  large  q  is  potentially  problematic,  because  the  number  of  clock  cycles  to 
support  mod-g  operations  using  naive  “big  integer”  arithmetic  grows  at  least  linearly  (and  often  quadrat- 
ically)  with  the  number  of  bits  used  to  represent  q  for  even  the  simplest  operations,  e.g.,  modular  addition 
and  multiplication.  We  use  a  variation  of  the  double-CRT  approach  discussed  in  |15l  to  circumvent  this 
problem  using  the  standard  technique  of  a  “residue  number  system”  (based  on  the  Chinese  remainder 
theorem  over  the  integers)  to  represent  ciphertexts  as  t  length-n  integer  vectors  of  mod-g^  values  instead 
of  a  single  integer  vector  mod  q  where  q  =  q\  *  ■  ■  ■  *  qt  for  pairwise  coprime  moduli  qi .  For  our  ciphertext 
representation  we  use  t  length-?r  integer  vectors  of  mod-gi  values  represented  as  a  n  x  t  integer  matrix. 
With  our  double-CRT  approach,  the  number  of  moduli  (t)  grows  to  support  the  secure  execution  of 
larger  programs,  but  more  bits  are  not  required  to  represent  the  moduli  gi ,  •  •  •  ,  gt .  Our  implementation 
supports  the  secure  execution  of  depth  t  —  1  programs  with  t  moduli. 

The  double-CRT  representation  is  an  extension  of  the  Chinese  Remainder  Transform  (CRT)  fTS] 
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representation  used  in  prior  SHE  and  FHE  implementations.  Chinese  remainder  transforms  are  used 
to  convert  ciphertexts  from  the  natural  “power  basis”  representation  to  the  double-CRT  representation. 
This  conversion  can  mathematically  be  represented  as  a  multiplication  by  square  n  x  n  matrices,  but 
admits  a  fast,  highly  parallel  evaluation  procedure  that  is  closely  related  to  the  Cooley- Tukey  Fast 
Fourier  Transform  (and  others.) 

As  we  discuss  more  in  Section  [4]  below,  each  of  the  moduli  q-[ ,  •  •  •  ,qt  can  be  represented  as  64-bit 
integers  and  still  support  the  secure  execution  of  non-trivial  programs.  These  64-bit  representations 
greatly  improve  the  practicality  of  our  approach  to  SHE  and  FHE.  By  using  64-bit  modular  operations 
to  manipulate  ciphertexts,  keys,  etc.,  we  support  faster  low-level  execution  of  the  SHE  operations  on 
commodity  64-bit  (or  even  32-bit)  processors. 

An  advantage  of  our  double-CRT  NTRU  approach  is  that  the  FHE  operations  can  be  highly  paral¬ 
lelized.  Similar  to  the  standard  CRT  representation,  by  using  a  double-CRT  representation,  the  EvalAdd, 
EvalMult  operations  and  key  sub-operations  in  Bootstrapping,  Modulus  Reduction,  Ring  Switching  and 
Key  Switching  can  become  t  naively  parallelized  operations.  This  greatly  simplifies  the  secure  execution  of 
programs  using  our  FHE  implementation  as  compared  to  other,  non-CRT  representations  of  ciphertexts. 


3  Cryptosystem 

In  this  section  we  describe  the  somewhat  homomorphic  cryptosystem  we  use  that  is  very  similar  to 
the  NTRU  system  [T5j,  though  it  was  not  until  recently  that  its  homomorphic  properties  were  noticed 
independently  by  Lopez-Alt  et  al.  JT8]  and  Gentry  et  al.  mi- 

For  ease  of  implementation  and  design  simplicity,  we  limit  our  description  to  power-of-2  cyclotomic 
rings.  For  ring  dimension  n  which  is  a  power  of  2,  define  the  ring  R  =  Jj[x\/[xn  +  1)  (i.e. ,  integer 
polynomials  modulo  xn  +  1).  For  a  positive  integer  q ,  define  the  quotient  ring  Rq  =  R/qR  (i.e.,  integer 
polynomials  modulo  xn  +  1,  with  coefficients  from  Zg  =  Z/gZ). 

3.1  Basic  NTRU-Type  System 

In  this  subsection  we  provide  a  mathematical  description  of  a  somewhat  homomorphic  NTRU-based 
scheme.  The  message  space  is  Rp  for  some  integer  p  >  2,  and  most  arithmetic  operations  are  performed 
modulo  some  q  p  that  is  relatively  prime  with  p.  Fast  addition  and  multiplication  in  Rq  can  be 
performed  by  using  the  mod-g  Chinese  Remainder  Transform  (CRT)  representation  of  elements.  The 
basic  operations  of  the  scheme  are  as  follows: 

•  Gen:  choose  a  short  /  £  R  such  that  /  =  1  mod  p  and  /  is  invertible  modulo  g,  and  a  short  g  £  R. 
Output  pk  =  h  =  g  ■  f~1  mod  q  and  sk  =  f. 

Note  that  /  is  invertible  modulo  q  if  and  only  if  each  of  its  mod-g  CRT  coefficients  is  nonzero.  The 
CRT  coefficients  of  /-1  (modulo  g)  are  just  the  mod-g  inverses  of  those  of  /. 

Concretely,  the  short  elements  /  and  g  can  be  chosen  from  discrete  Gaussians.  E.g.,  we  can  let 
/  =  p-  f'  +  l  for  some  Gaussian-distributed  /'.  Note  that  such  an  /  will  have  expectation  (center)  1. 
Using  a  zero-centered  /  can  have  some  advantages,  and  may  be  chosen  using  a  more  sophisticated 
sampling  algorithm. 

•  Enc (pk  =  h,  p  £  Rp):  choose  a  short  r  £  R  and  a  short  m  £  R  such  that  m  =  p  modp.  Output 
c  =  p  ■  r  ■  h  +  m  mod  q. 

Concretely,  m  can  naively  be  chosen  as  m  =  p  ■  m!  +  p  for  a  Gaussian-distributed  m! ,  but  again, 
such  an  m  is  not  zero-centered.  It  is  typically  better  to  choose  m  as  a  zero-centered  random  variable 
congruent  to  fi  modulo  p. 

•  Dec(sfc  =  /,  c  £  Rq):  compute  b  =  f  ■  c  mod  g,  and  lift  it  to  the  integer  polynomial  b  £  R  with 
coefficients  in  [—  g/2,  g/2).  Output  p  =  b  mod  p. 

The  homomorphic  operations  are  defined  as  follows: 
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•  Eva  I  Add  (co,  Ci):  output  c  =  Cq  +  Ci  mod  q. 

•  EvalMult(co,  Ci):  output  c  =  Cq  ■  Ci  mod  q. 

With  the  use  of  EvalMult,  the  decryption  procedure  needs  to  be  modified.  Define  the  “degree”  of 
ciphertexts  as  follows:  a  freshly  generated  ciphertext  has  degree  1,  and  the  degree  of  c  =  EvalMult(co,  ci) 
is  the  sum  of  the  degrees  of  Cq  and  Ci .  Then  decryption  of  a  ciphertext  c  of  degree  at  most  d  is  the  same 
as  above,  except  that  we  instead  compute  b  =  fd  ■  c  mod  q. 

3.2  Key  Switching 

Key  switching  converts  a  ciphertext  of  degree  at  most  d,  encrypted  under  a  secret  key  /i,  into  a  degree- 1 
ciphertext  C2  encrypted  under  a  secret  key  (which  may  or  may  not  be  the  same  as  f±).  This  requires 
publishing  a  “hint” 

ai_>2  =  m  ■  f?  •  /2-1  mod  q, 

for  a  short  m  €  R  congruent  to  1  modulo  p.  (Concretely,  we  can  choose  m  =  p  ■  e  +  1  for  a  Gaussian- 
distributed  e,  though  a  zero-centered  m  is  better.) 

•  KeySwitch(ci,  ai^2):  output  C2  =  ai_>2  ■  ci  mod  q. 

Note  that  ai_>.2>  Cl,  C2  can  all  be  stored  and  operated  upon  in  CRT  form,  so  key  switching  is  very  effi¬ 
cient:  the  hint  is  just  one  ring  element,  and  the  procedure  involves  just  one  coordinate- wise  multiplication 
of  the  CRT  vectors.  This  compares  quite  favorably  to  key-switching  procedures  for  other  cryptosystems, 
which  typically  require  decomposing  a  ciphertext  into  several  short  ring  elements  and  performing  several 
ring  multiplications. 

3.3  Ring  Reduction 

Ring  reduction  maps  a  ciphertext  from  ring  n  to  smaller  ring  n'  =  n/2a,  where  typically  a  =  1.  Although 
we  describe  a  ring  reduction  operation  for  power-of-2  rings,  more  general  ring  switching  approaches  exist 
and  can  be  obtained  from  simple  generalizations  of  the  approach  we  describe  here. 

The  basic  ring  switching  operation  is  a  Decompose  algorithm,  which  maps  a  dimension  n  ring  to 
dimension  n!  elements.  Decompose(c)  works  as  follows: 

•  Let  c  =  (c o, ...,  cn- 1)  be  in  the  power  basis  and  let  w  =  n/n' . 

•  We  output  ciphertexts  c[  for  each  i  =  0, ...,  w—1  where  c'  =  (c*,  cw+i,  C2W+i, ...,  C(m/_  i)u>-h)-  Le.,  c' 
just  consists  of  those  entries  of  c  whose  indices  are  i  mod  w. 

Before  applying  Decompose  we  first  key-switch  the  ciphertext  to  one  which  can  be  decrypted  by  a 
“sparse”  secret  key  sk ,  whose  only  nonzero  entries  in  the  power  basis  are  at  indices  equal  to  0  mod  w. 
We  perform  the  ring-switching  on  a  ciphertext  c,  by  performing  key-switching  on  c  to  get  cp  (encrypted 
under  sk),  then  call  Decompose(cp)  to  get  the  /c'/.  The  ciphertext  c  should  only  have  plaintext  data 
only  in  its  indices  0  mod  w.  Otherwise,  this  data  is  lost  during  the  ring  reduction  operation. 

3.4  Modulus  Reduction 

Modulus  reduction,  initially  proposed  in  (3],  converts  a  ciphertext  from  modulus  q  to  a  smaller  modulus 
( q/q '),  where  q'  divides  q  (and  so  is  also  relatively  prime  with  p),  while  also  reducing  the  underlying  noise 
by  about  a  q'  factor. 

The  basic  description  is  as  follows:  given  a  ciphertext  c  €  Rq,  we  add  to  it  a  small  integer  multiple 
of  p  that  is  congruent  to  —  c  mod  q' .  This  ensures  that  the  underlying  noise  remains  small,  the  plaintext 
remains  unchanged,  and  the  resulting  ciphertext  is  divisible  by  q' .  Then  we  can  divide  both  the  ciphertext 
and  modulus  by  q' ,  which  reduces  the  underlying  noise  term  by  a  q'  factor  as  well. 
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Note  that  the  final  step  (of  dividing  by  q')  implicitly  multiplies  the  underlying  message  by  (g')_1  mod 
p.  We  can  either  keep  track  of  these  extra  factors  as  part  of  the  ciphertext  and  correct  for  them  as  the 
final  step  of  decryption,  or  we  can  just  ensure  that  q'  =  1  mod  p,  so  that  division  by  q'  does  not  affect 
the  underlying  message. 

The  following  formal  procedure  uses  the  fixed  (ciphertext-independent)  value  v  =  (<7') — 1  mod  p ,  which 
can  be  computed  in  advance  and  stored. 

•  ModReduce(c,  g,  q'): 

1.  compute  a  short  d  £  R  such  that  d  =  c  mod  q'. 

2.  compute  a  short  A  £  R  such  that  A  =  (vq'  —  l)-d  mod  ( pq' ).  E.g.,  all  of  A’s  integer  coefficients 
can  be  in  the  range  [—pq'/2,pq'/2). 

3.  let  d'  =  c  +  A  mod  q.  By  construction,  d!  is  divisible  by  q' . 

4.  output  {d'/q')  £  R(q/q>). 

Following  m,  the  above  is  most  efficient  to  implement  when  q  =  <71  •  •  •  qt  is  the  product  of  several 
small,  pairwise  relatively  prime  moduli;  when  q'  is  one  of  those  moduli  (say,  q'  =  qt  without  loss  of 
generality);  and  when  c  is  represented  in  “double-CRT”  form,  i.e. ,  each  of  c’s  mod-g  CRT  coefficients  is 
itself  represented  in  (integer)  CRT  form  as  a  vector  of  mod-g^  values,  one  for  each  i.  Then  the  above 
steps  can  be  computed  as  follows: 

1.  Computing  d  is  done  by  inverting  the  mod-gt  CRT  on  the  vector  of  mod-gt  components  of  c  (leaving 
the  other  mod-g*  components  unused),  and  interpreting  the  resulting  coefficients  as  integers  in 
l-qt/2,qt/2). 

2.  Computing  A  is  done  by  multiplying  the  coefficients  of  d  by  the  fixed  scalar  {yqt  —  1)  modulo  pqt. 

3.  Adding  A  to  c  is  done  by  computing  the  double-CRT  representation  of  A  (i.e.,  applying  each  mod-g* 
CRT  to  A),  and  adding  it  entry- wise  to  c’s  double-CRT  representation. 

Note  that  the  mod-gt  CRTs  of  A  and  c  are  just  the  negations  of  each  other  (by  construction),  so 
their  sum  is  the  all-zeros  vector.  Therefore,  there  is  no  need  to  explicitly  compute  the  mod-gt  CRT 
of  A. 

4.  Computing  d! /qt  is  done  by  dropping  the  mod-gt  components  in  the  double-CRT  representation 
of  d'  (which  are  all  zero  anyway),  and  multiplying  every  mod-gt  component  by  the  fixed  scalar 
gt_1  mod  qt.  (These  scalars  can  be  computed  in  advance  and  stored.) 

3.5  Composed  EvalMult 

We  use  the  Key  Switching,  Ring  Reduction  and  Modulus  Reduction  operations  as  supporting  functions 
with  EvalMult  to  improve  noise  management  and  enable  more  computation  between  calls  to  the  Boot¬ 
strapping  operation.  Taken  together,  we  form  a  composite  operation,  which  we  call  ComposedEvalMult, 
from  the  sequential  execution  of  an  EvalMult,  Key  Switching  and  Modulus  Reduction  operation. 

Ring  Reduction  is  called  during  some  ComposedEvalMult  operations,  depending  on  the  level  of  se¬ 
curity  provided  by  a  ciphertext  resulting  from  the  result  of  the  Ring  Reduction  operation.  As  Modulus 
Reduction  operations  are  performed  the  security  provided  by  a  ciphertexts  increases  (as  described  in  01) 
Ring  Reduction  correspondingly  reduces  the  level  of  security  provided  by  a  ciphertext.  We  implemented 
our  FHE  library  such  that  a  minimum  level  of  security  S'  is  provided  at  all  times,  and  this  level  of  S'  is 
a  parameter  selectable  by  the  library  user.  If  a  call  to  a  Ring  Reduction  operation  will  result  in  a  level 
of  security  5  <  1 S',  then  the  RingReduction  is  performed  in  the  ComposedEvalMult  operation. 

Our  conception  is  that  due  to  the  ModReduction  and  RingReduction  component  of  ComposedE¬ 
valMult,  it  is  feasible  to  coordinate  the  choice  of  the  original  ciphertext  width  t  and  the  scheduling  of 
ComposedEvalMult  operations  so  that  the  final  ciphertext  resulting  from  secure  circuit  evaluation  and 
which  needs  to  be  decrypted  is  only  one  column  wide  with  respect  to  a  single  modulus  q±  and  provides 
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a  level  of  security  at  least  as  great  as  the  original  ciphertexts  resulting  from  the  encryption  operation. 
More  explicitly,  if  we  need  to  support  a  depth  t  —  1  computation,  the  initial  encryptions  should  only  be 
t  columns  wide  to  ensure  that  the  final  ciphertext  is  1  column  wide.  Whereas  the  runtime  of  Encryp¬ 
tion,  EvalAdd,  ComposedEvalMul  depend  on  the  ring  dimension  and  depth  of  computation  supported, 
the  Decryption  operation  would  hence  depend  only  on  the  final  ring  dimension  after  all  ring  switching 
has  been  completed.  If  we  need  to  decrypt  a  ciphertext  that  has  multiple  columns  we  our  double-CRT 
representation,  we  could  perform  multiple  ModReduction  operations  to  reduce  this  t  >  1  ciphertext  until 
we  are  left  with  a  single  mod-Qi  column. 

3.6  Bootstrapping 

The  basis  of  our  bootstrapping  approach  comes  from  a  new  approach  to  homomorphic  rounding.  This 
approach  to  bootstrapping  is  described  in  detail  in  [I] .  We  provide  a  high-level  overview  of  this  operation 
here,  simplified  for  our  restriction  to  power-of-2  rings.  This  operation  has  the  following  steps: 

1.  Round  the  ciphertext :  For  each  entry  v  for  residue  i.  we  output  round{v  *  q/qi),  where  the  inner 
expression  is  rational,  and  ’’round”  means  taking  the  nearest  integer.  Generally  q  =  2^  is  chosen 
experimentally,  but  as  small  as  possible. 

2.  Convert  the  plaintext  modulus:  This  is  no-op  under  our  simplifying  assumptions. 

3.  Lift  the  ciphertext  and  plaintext  moduli:  This  is  also  a  no-op  under  our  simplifying  assumptions. 

4.  Scale  the  ciphertext:  We  scale  up  the  ciphertext  by  a  Q/q'  factor  (rounding  to  nearest  integers  in 
the  power  basis),  and  embed  into  dimension  N  (new  ring  dimension)  as  well.  The  plaintext  modulus 
is  still  q' . 

5.  Compute  the  homomorphic  trace:  The  following  steps  are  performed  iteratively  log2(IV)  times: 

(a)  ’’Lift”  the  ciphertext  modulus  to  2 Q,  which  has  the  effect  of  making  the  plaintext  modulus  2q. 

(b)  Apply  the  automorphism  from  PQ,  with  appropriate  key  switching  to  put  the  result  into  the 
same  key  as  the  original  ciphertext  in  the  iteration. 

(c)  Sum  the  original  and  resulting  ciphertexts. 

(d)  Divide  the  ciphertexts  by  2. 

6.  Perform  a  homomorphic  rounding:  This  operation  is  described  in  Appendix  B  of  [jQ. 

4  Parameter  Selection 

The  selection  of  n  and  qi, ...  ,qt  depends  heavily  on  the  plaintext  modulus  p,  the  depth  of  computation 
that  needs  to  be  supported,  and  the  desired  security  level.  We  capture  the  primary  concerns  influencing 
the  selection  of  a  ring  dimension  n  and  the  moduli  q\, ...  ,qt  at  a  high  level  as  follows: 

•  The  necessary  ring  arithmetic  should  be  easily  supported  on  the  computation  substrate  i.e. ,  that 
mod-Qi  operations  (for  i  G  {1, . . . ,  t})  require  few  clock  cycles. 

•  The  moduli  qi , . . . ,  qt  are  sufficiently  large  to  enable  sufficient  noise  shrinkage  via  modulus  reduction. 

•  The  ring  dimension  n  and  noise  parameters  are  sufficiently  large  so  the  scheme  provides  adequate 
security. 

•  The  ring  dimension  n  is  not  so  large  that  it  becomes  overly  time-consuming  and  memory-intensive 
to  manipulate  the  ciphertexts. 

•  The  plaintext  modulus  p  and  any  noise  added  to  the  ciphertext  during  encryption  is  sufficiently 
small  that  we  can  evaluate  reasonably  sized  circuits  with  correct  decryption. 
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Table  1:  Dependence  of  bit  lengths  of  moduli  qi,  as  a  function  of  ring  dimension  for  p  =  2. 


Ring  dimension  n 

512 

1024 

2048 

4096 

8192 

16384 

Bit  length  log2(g.j) 

44 

45 

47 

48 

50 

51 

We  choose  to  add  discrete  Gaussian  noise  to  the  fresh  ciphertexts  where  r  =  6  represents  the  selected 
probability  distribution  parameter.  We  have  found  theoretically  that  the  smallest  modulus  q±  needs  to 
satisfy  the  expression 

qi  >  4pry/nw  (1) 

in  order  to  ensure  successful  decryption,  where  the  parameter  w  ~  6  represents  an  “assurance”  measure 
for  correct  decryption  (essentially,  the  probability  of  decryption  failure  is  bounded  by  the  probability  that 
a  normally  distributed  variable  is  more  than  wy/ 2jt  standard  deviations  from  its  mean),  and  p  ■  r  is  the 
Gaussian  parameter  of  the  noise  used  in  fresh  ciphertexts.  (Hence  r  is  the  Gaussian  parameter  of  the 
underlying  NTRU-like  problem.) 

After  selecting  qq .  we  select  the  remaining  qi  £  {q2, . . . ,  qt}  such  that 

qi  >  4p2r5n15w5,  (2) 

which  ensures  that  modulus  reduction  by  a  factor  of  qi  sufficiently  reduces  the  noise  after  a  ComposedE- 
valMult  operation.  For  implementation  simplicity,  we  set  qi  to  be  the  smallest  feasible  solution  to 
qi  >  4 p2r5n15w5.  Consequently  all  qi  are  represented  by  log2(gt)  bits,  leading  to  simpler  implementa¬ 
tions. 

Table  [1]  shows  how  many  bits  are  required  to  represent  qq, . . . ,  qt  for  varying  ring  dimensions  for  p  =  2. 
Note  that  all  qq , . . . ,  qt  can  be  represented  in  less  than  64  bits. 

Following  0  mi  m  ng,  we  use  the  standard  “root  Hermite  factor”  5  as  the  primary  measure  of 
concrete  security  for  a  set  of  parameters.  The  most  recent  experimental  evidence  0  suggests  that 
S  =  1.007  would  require  roughly  240  core-years  on  recent  Intel  Xeon  processors  to  break.  Using  the 
estimates  from  mm,  we  found  that  in  order  to  achieve  a  security  level  <5  for  a  depth  of  computation 
d  =  t  —  1  using  the  t  moduli  qi , . . . ,  qt ,  we  need  to  ensure  that 

n>  lg(qi---qt)/{4lg(S)).  (3) 

Table  [2]  shows  how  S  varies  as  a  function  of  the  ring  dimension  and  depth  of  computation  supported. 
Based  on  our  analysis,  if  we  impose  the  requirement  that  S  <  1.007,  then  we  would  need  to  use  ring 
dimension  n  =  16324  to  support  depth  d  =  13  computations. 


Table  2:  Security  level  S ,  as  a  function  of  depth  of  computation  supported  and  ring  dimension  for  p  =  2. 


Depth 

Dim. 

1 

3 

5 

7 

9 

11 

13 

15 

17 

19 

512 

1.015 

1.045 

1.077 

1.109 

1.143 

1.178 

1.213 

1.250 

1.288 

1.327 

1024 

1.007 

1.023 

1.038 

1.054 

1.070 

1.087 

1.104 

1.121 

1.138 

1.155 

2048 

1.004 

1.012 

1.020 

1.028 

1.036 

1.044 

1.053 

1.061 

1.069 

1.078 

4096 

1.002 

1.006 

1.010 

1.014 

1.018 

1.022 

1.026 

1.030 

1.035 

1.039 

8192 

1.0011 

1.003 

1.005 

1.007 

1.009 

1.011 

1.013 

1.016 

1.018 

1.020 

16384 

1.0005 

1.0016 

1.003 

1.003 

1.005 

1.006 

1.007 

1.008 

1.009 

1.010 

5  Evaluation  Experiments 

We  implemented  our  scheme  in  the  Mathworks  Matlab  environment  and  used  the  Matlab  coder  toolkit 
f21]  to  generate  an  ANSI  C  representation  of  our  implementation.  We  subsequently  hand-modified  our 
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auto-generated  ANSI  C  to  incorporate  the  pthreads  library  j4]  to  leverage  parallelism.  We  compiled 
this  ANSI  C  using  gcc  to  run  as  an  executable  in  a  Linux  environment.  We  believe  that  additional 
performance  improvements  could  be  obtained  by  implementing  our  FHE  scheme  natively  in  C. 

We  chose  to  implement  our  scheme  in  Matlab  because  it  provides  an  interpreted  computation  envi¬ 
ronment  for  rapid  prototyping  with  native  support  for  vector  and  matrix  manipulation  which  simplifies 
implementation  development.  We  found  the  Matlab  syntax  to  be  a  natural  fit  for  writing  software  to 
support  the  primitive  lattice  operations  needed  for  our  double-CRT  NTRU-based  SHE  design. 

We  wrote  our  Matlab  implementation  of  our  double-CRT  NTRU  SHE  scheme  using  the  Matlab 
fixed-point  toolbox.  The  Matlab  fixed-point  toolbox  also  provides  a  path  toward  generated  HDL  imple¬ 
mentations  of  our  design  that  can  be  deployed  for  practical  use  on  highly  parallel  computing  hardware 
such  as  FPGAs.  Part  of  our  vision  for  the  use  of  our  SHE  design  is  to  develop  an  FPGA  implementation 
of  FHE  [2[HJ. 

We  ran  our  compiled  implementation  on  a  64core  server  with  2.1GHz  Intel  Xeon  processors  and  1TB 
of  RAM  in  a  CentOS  environment.  Although  we  had  access  to  many  resources,  we  used  at  most  10  GB 
of  memory  and  20  cores  during  the  evaluation  of  our  software  implementation. 

We  collected  data  on  the  runtime  of  the  Encryption,  EvalAdd,  ComposedEvalMult,  Decryption  and 
Bootstrapping  operations  over  selections  of  depth  of  computation  supported  and  ring  dimension.  We 
ran  100  iterations  of  this  collection  procedure  for  each  combination  of  t  and  ring  dimension.  We  used 
different  randomly  selected  key  sets,  plaintexts  and  encryption  noise  on  every  iteration  to  mitigate  minor 
variations  in  performance  that  may  arise  due  to  these  experimental  random  variables  on  every  iteration. 
Tables  of  the  raw  mean  runtime  results  can  be  seen  in  Tables  [3]  through  [7]  in  Appendix  [Aj 

We  collected  data  on  the  runtime  of  the  Encryption,  EvalAdd  and  ComposedEvalMult  operations 
for  settings  of  t  £  {2, 4,  6, ...,  20}  and  for  ring  dimensions  n  £  {512,1024,2048,4096,8192,16384}.  We 
collected  data  on  the  runtime  of  the  Decryption  operation  of  final  ciphertexts,  for  computations  with 
fresh  (input)  ciphertexts  with  ring  dimensions  n  £  {512, 1024, 2048, 4096, 8192, 16384}  and  depth  of  com¬ 
putation  t  —  1  for  t  £  {2, 4, 6, ...,  20}.  Note  that  due  to  ring  switching,  decryption  runtime  is  dependent 
only  on  the  dimension  of  the  final  ciphertext,  which  is  a  function  of  the  initial  ciphertext  and  depth 
of  computation.  We  collected  data  on  the  runtime  of  the  Bootstrapping  operation  for  settings  of  the 
“maximum”  ring  dimensions  n  £  {512,1024,2048,4096,8192,16384}  ciphertexts  are  expressed  in  where 
the  resulting  ciphertext  supports  a  depth  one  computation  before  another  bootstrapping  operations  is 
required.  As  discussed  in  [I],  the  depth  of  computation  required  for  bootstrapping  is  logarithmic  in 
the  ring  dimension.  We  are  currently  exploring  practical  trade-offs  associated  with  the  impacts  on  the 
scheduling  of  bootstrapping  to  enable  more  computation  between  bootstrapping  calls. 

Our  experimental  results  shows  that  run  times  grow  linearly  with  ring  dimension  n  and  the  ciphertext 
width  t  where  t  —  1  is  the  depth  of  computation  supported  before  bootstrapping  or  decryption  could  still 
be  performed  and  have  a  high  probability  of  recovering  a  correctly  decrypted  ciphertext.  This  makes  intu¬ 
itive  sense  because  as  we  double  either  the  ring  dimension  or  the  ciphertext  width,  we  roughly  double  the 
amount  of  computation  that  needs  to  be  performed  with  every  Encryption,  EvalAdd  and  ComposedE¬ 
valMult  operation.  Similar  results  hold  for  Decryption  (Table  |H1)  which  shows  a  linear  dependence  of 
runtime  on  ring  dimension,  but  under  the  assumption  that  decryption  occurs  after  t  —  1  ModReduction 
operations,  including  ModReduction  operations  bundled  in  ComposedEvalMult  operations.  Our  initial 
results  show  that  Bootstrapping  runtime  is  similarly  linear  with  respect  to  the  maximum  ring  dimen¬ 
sion.  As  compared  to  the  results  reported  in  [12J  j23]  |24] ,  our  FHE  software  implementation  provides 
order-of-magnitude  improvements  in  the  runtime  of  the  FHE  operations. 


6  Discussion  and  Looking  Forward 

Our  FHE  implementation  is  part  of  our  long-term  vision  to  support  a  general,  practical  and  secure 
computing  capability  through  a  layered  services  architecture.  Part  of  our  vision  is  to  provide  software 
interfaces  in  our  design  for  our  highly  optimized  implementations  of  the  basic  FHE  operations  (KeyGen, 
Encrypt,  EvalAdd,  EvalMult,  Decrypt)  for  users  to  construct  general  applications  that  require  secure 
computation  on  encrypted  data  with  automated  calls  to  supporting  operations  such  as  Ring  Switching, 
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Key  Switching,  Modulus  Reduction  and  Bootstrapping.  Inherent  to  this  architecture  vision  is  our  FHE 
implementation  of  lattice-based  computational  primitives  which  form  a  lower  layer  of  our  envisioned 
architecture.  We  use  these  primitives  such  as  ring  addition,  ring  multiplication,  modulus  operations  and 
the  Chinese  Remainder  Transforms  to  run  on  commodity  computing  devices  such  as  CPUs  and  FPGAs. 
We  designed  this  modular  approach  to  the  implementation  of  the  SHE  operations  and  the  underlying 
core  primitives  which  allows  us  to  1)  augment  these  operations  with  additional  operations  such  as  a 
bootstrapping  operation  (which  enables  FHE),  or  2)  replace  the  implementations  of  a  subset  of  the 
operations  or  primitives  as  implementation  advances  are  made. 

A  further  aspect  of  our  layered  architecture  vision  is  our  ability  to  mix-and-match  a  computing  sub¬ 
strate  at  lower  levels  of  our  architecture.  Although  not  an  immediate  focus  of  the  results  reported  here, 
the  double-CRT  representation,  coupled  with  the  64-bit  integer  representation,  simplifies  parallelization 
of  our  FHE  scheme  for  easier  porting  to  other,  high-performance  and  low-cost  parallel  computing  environ¬ 
ments  such  as  FPGAs  dig  and  possibly  even  GPUs  m-  If  ported  to  a  dedicated  FPGA  co-processor, 
the  runtime  of  our  underlying  SHE/FHE  implementation  can  be  greatly  improved  upon  as  compared  to 
the  runtime  of  the  corresponding  interpreted  CPU-only  implementation  which  we  discuss  herein. 

Taken  together,  we  see  our  design  and  experimentation  with  our  NTRU-based  FHE  scheme  as  a 
stepping-stone  to  a  practical  implementation  of  FHE  through  our  layered  architecture  vision.  Our  pri¬ 
mary  path  forward  is  to  increasingly  leverage  the  inherent  parallelism  of  our  design  at  multiple  levels  of 
our  implementation.  At  a  low  level  we  are  working  to  port  our  lattice-based  primitives  to  operate  on 
commodity  FPGAs.  This  higher  level  parallelism  offers  the  possibility  of  more  practical  SHE  and  FHE 
on  both  multi-core  CPUs  or  multiple  parallel  FPGAs  operating  as  “FHE  co-processors”. 
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A  Experimental  Results 


Table  3:  Encryption  Runtime  (ms)  vs.  Depth  of  Computation  Supported  and  Ring  Dimension  for  p  =  2. 


Depth 

Dim. 

1 

3 

5 

7 

9 

11 

13 

15 

17 

19 

512 

2.32 

2.83 

2.86 

3.27 

3.39 

3.25 

4.38 

4.64 

5.35 

5.66 

1024 

3.87 

5.33 

5.17 

5.98 

5.68 

5.63 

6.94 

8.40 

9.04 

9.20 

2048 

6.26 

6.48 

7.01 

7.47 

7.94 

8.78 

12.70 

13.03 

13.05 

14.52 

4096 

12.08 

12.27 

13.04 

14.87 

17.38 

17.65 

20.73 

17.46 

21.57 

22.13 

8192 

24.53 

25.18 

26.13 

29.07 

30.81 

32.15 

34.43 

32.46 

36.16 

37.90 

16384 

52.30 

55.02 

58.05 

59.71 

60.29 

61.98 

63.44 

64.99 

69.96 

72.89 

Table  4:  EvalAdd  Runtime  (ms)  vs.  Depth  of  Computation  Supported  and  Ring  Dimension  for  p  =  2. 


Depth 

Dim. 

1 

3 

5 

7 

9 

11 

13 

15 

17 

19 

512 

0.21 

0.32 

0.42 

0.54 

0.64 

0.73 

1.26 

2.11 

2.90 

3.12 

1024 

0.30 

1.04 

0.47 

0.57 

0.72 

0.74 

1.40 

2.72 

2.85 

2.93 

2048 

0.37 

0.45 

0.55 

0.67 

0.80 

1.00 

1.97 

3.00 

3.04 

3.24 

4096 

0.56 

0.65 

0.74 

0.91 

1.92 

2.07 

2.25 

2.43 

3.73 

3.54 

8192 

0.89 

1.01 

1.20 

1.36 

2.46 

2.70 

3.69 

3.23 

5.05 

5.44 

16384 

1.58 

1.82 

2.12 

2.39 

3.99 

4.19 

4.27 

4.77 

7.16 

7.29 

Table  5:  ComposedEvalMult  Runtime  (ms)  vs.  Depth  of  Computation  and  Ring  Dim.  for  p  =  2. 


.  Depth 

Dim. 

1 

3 

5 

7 

9 

11 

13 

15 

17 

19 

512 

16.03 

22.73 

23.32 

22.65 

22.87 

22.96 

24.35 

25.24 

25.37 

25.78 

1024 

29.15 

37.85 

39.05 

39.11 

38.79 

39.24 

39.49 

39.59 

39.52 

39.68 

2048 

49.17 

66.31 

66.77 

67.41 

67.15 

68.38 

68.22 

69.27 

69.45 

71.09 

4096 

99.56 

140.42 

140.71 

141.42 

141.26 

142.75 

143.52 

145.51 

144.61 

148.31 

8192 

196.83 

279.37 

280.42 

284.40 

283.98 

285.69 

289.59 

286.55 

292.69 

295.69 

16384 

463.92 

623.19 

622.74 

628.87 

630.43 

633.37 

639.52 

642.80 

651.20 

659.88 
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Table  6:  Decryption  Runtime  (ms)  vs.  Depth  of  Computation  Supported  and  Initial  Ring  Dim.  for  p  =  2. 


Depth 

Dim. 

1 

3 

5 

7 

9 

11 

13 

15 

17 

19 

512 

0.40 

0.26 

0.13 

0.14 

0.10 

0.10 

0.06 

0.06 

0.06 

0.06 

1024 

0.87 

0.38 

0.18 

0.11 

0.11 

0.11 

0.11 

0.11 

0.05 

0.05 

2048 

1.92 

0.84 

0.38 

0.38 

0.22 

0.22 

0.22 

0.22 

0.12 

0.12 

4096 

3.36 

1.70 

0.84 

0.86 

0.37 

0.39 

0.38 

0.22 

0.22 

0.21 

8192 

7.22 

3.43 

1.67 

1.72 

0.85 

0.87 

0.86 

0.87 

0.39 

0.40 

16384 

15.36 

7.18 

3.37 

3.37 

1.67 

1.67 

1.67 

1.73 

0.87 

0.85 

Table  7:  Bootstrapping  Runtime  (s)  vs.  Ring  Dimension  for  p  =  2. 


Ring  Dimension 

512 

1024 

2048 

4096 

8192 

16384 

Runtime  (s) 

5.8 

13 

26 

60 

125 

275 
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9  LIST  OF  ABBREVIATONS  AND  ACRONYMS 


2G 

2nd  generation  Wireless  Telecommunications 

3G 

3rd  generation  Wireless  Telecommunications 

4G  LTE 

4th  generation  Long-Term  Evolution  Wireless  Telecommunications 

AES 

Advanced  Encryption  Standard 

AND 

logical  And  operation 

ANSI  C 

American  National  Standards  Institute  Standard  for  the  C  programming  language. 

ASCII 

American  Standard  Code  for  Information  Interchange  is  a  character-encoding 
scheme. 

AWS 

Amazon  Web  Services  (Cloud  Computing  Service) 

AXI 

Advanced  extensible  Interface  4th  generation.  An  open  standard  hardware 
interconnection  bus  used  in  FPGA  designs. 

AXI4 

Advanced  extensible  Interface.  An  open  standard  hardware  interconnection  bus 
used  in  FPGA  designs. 

BRAM 

Block  RAM  (in  an  FPGA) 

CEM 

composed  Eval  Mult 

CPU 

Central  Processing  Unit 

CRT 

Chinese  Remainder  Transform 

DDoS 

Distributed  Denial  of  Service  Computer  Network  Attack 

DDR3 

Double  data  rate  type  three  synchronous  dynamic  random-access  memory  Direct 

DMA 

Memory  Access  (controller) 

DSL 

Digital  subscriber  line  computer  communications  over  telephone  lines 

EC2 

Amazon  Elastic  Compute  Cloud  (Cloud  Computing  Service) 

EKWS 

encrypted  keyword  search 

FFT 

Fast  Fourier  Transform 

FHE 

Fully  Flomomorphic  Encryption 

FHEPU 

Fully  Flomomorphic  Encryption  Processing  Unit 

FPGA 

Field  Programmable  Gate  Array 

GB 

Gigabytes 

gcc 

GNU  C  Compiler 

Gen 

Generation 

GMP 

GNU  Multiple  Precision  Arithmetic  Library 

GNU 

Open  source  software  consorti:  Gnu's  Not  Unix 

GPGPU 

General  Purpose  Graphical  Processing  unit 

GPL 

General  Public  License 

GPU 

Graphical  Processing  Unit 

HDL 

Hardware  Design  language  (i.e.  Verilog  or  VHDL) 

HE 

Homomorphic  Encryption 
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HELib 

Homomorphic  Encryption  Library  (originally)  from  Shai  Halevi  and  Victor  Shoup 

I/O 

Input  /  Output 

ICRT 

Inverse  Chinese  Remainder  Transform 

iOS 

iPhone  Operating  System 

IP 

In  an  FPGA  context:  Intellectual  property,  in  a  network  context:  Internet  Protocol 

kbs 

kilo-bits  per  second 

KHz 

kilohertz 

KWS 

keyword  search 

LTV 

Lopez-Alt,  Tromer  and  Vaikuntanathan  crypto  scheme 

Mb/sec 

megabit  per  second 

MOD  or  mod 

modulo  operation 

MPC 

Multi  Party  Computation 

mSec 

millisecond 

NAND 

logical  Nand  Function  (Not(And)) 

NOT 

logical  Not  Function 

NTRU 

NTRU  is  a  patented  and  open  source  public-key  cryptosystem  that  uses  lattice- 
based  cryptography  to  encrypt  and  decrypt  data.  Also  "Number  Theoretics  R  Us" 

NTT 

Number  Theoretic  Transform 

OR 

logical  Or  operation 

PC 

Personal  Computer 

PCI 

Peripheral  Component  Interconnect 

PCIe 

Peripheral  Component  Interconnect  Express 

PSTN 

Public  Switched  Telephone  Network 

RAM 

Random  Access  Memory 

ROM 

Read  Only  Memory 

SGMII 

serial  gigabit  media-independent  interface  (Gigabit  Ethernet  Physical  Layer) 

SHE 

Somewhat  Homomorphic  Encryption 

SIPHER 

Scalable  Implementation  of  Primitives  for  Homomorphic  EncRyption 

SMC 

Secure  Multiparty  Computation 

TCP/IP 

Transmission  Control  Protocol/Internet  Protocol 

VHDL 

VHSIC  Hardware  Description  Language 

VHSIC 

Very  High  Speed  Integrated  Circuit 

VOIP/ VoIP 

voice  over  Internet  Protocol 

XOR 

logical  exclusive  Or  operation 
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